Thursday, December 04, 2003
Human Visual Memory Representation
So I just had a meeting with my advisor, and there were some fairly interesting ideas discussed.
One goal in robotics is to be able to use vision as a tool in navigation, recognition and a number of other areas. Clearly they are related: "I've seen this place before, I know where I am". But algorithmically, things get very hairy, very fast. For instance this guy uses features extracted from a test image, measures parameters of each feature, and matches those feature parameters to training images, where the same feature extraction was used. He can then successfully detect an object in a scene that has been seem before in training data.
But the problem becomes harder when you are dealing with navigation, because the number of features seen grows so large. I mean, human vision operates at around 60Hz, and lets say we find 100 features in each image (very low estimate). From the time you woke up just today, you have seen ~129,600,000 features, with plenty of overlap. Now you find an arbitrary seen. Have you seen it before? What if it was 30 years ago?
I am amazed we even function! How can I possibly remember places, and I mean literally the contours of the walls and shape of the environment, that I haven't seen for so long? And as far as I know, it is still a mystery how humans represent visual imagery. Is it holographic, like Gibson suggests in 'Neuromancer', 3-D, 2-D, or 10^10-D ? Or everything but on different scales.
My advisor told a story: when his daughter was 3, he stopped at a phone-booth when they were lost. A year later, they were driving and his daughter says, "we've been on this street before, you stopped to make a phone call right THERE" and points to where the phone booth USED to be , but was removed after some construction or something. They hadn't been back to that point since. She must have taken in enough information from the scene to identify it without actually seeing the primary item, the phone booth. Simply amazing.
The computational capacity is quite awe inspiring. I am certain it is related to the power of forgetting combined with our focus of attention. Right now, as a computer, it is cheap never to forget any sensor data, and you wouldn't even know what to throw away...
So I just had a meeting with my advisor, and there were some fairly interesting ideas discussed.
One goal in robotics is to be able to use vision as a tool in navigation, recognition and a number of other areas. Clearly they are related: "I've seen this place before, I know where I am". But algorithmically, things get very hairy, very fast. For instance this guy uses features extracted from a test image, measures parameters of each feature, and matches those feature parameters to training images, where the same feature extraction was used. He can then successfully detect an object in a scene that has been seem before in training data.
But the problem becomes harder when you are dealing with navigation, because the number of features seen grows so large. I mean, human vision operates at around 60Hz, and lets say we find 100 features in each image (very low estimate). From the time you woke up just today, you have seen ~129,600,000 features, with plenty of overlap. Now you find an arbitrary seen. Have you seen it before? What if it was 30 years ago?
I am amazed we even function! How can I possibly remember places, and I mean literally the contours of the walls and shape of the environment, that I haven't seen for so long? And as far as I know, it is still a mystery how humans represent visual imagery. Is it holographic, like Gibson suggests in 'Neuromancer', 3-D, 2-D, or 10^10-D ? Or everything but on different scales.
My advisor told a story: when his daughter was 3, he stopped at a phone-booth when they were lost. A year later, they were driving and his daughter says, "we've been on this street before, you stopped to make a phone call right THERE" and points to where the phone booth USED to be , but was removed after some construction or something. They hadn't been back to that point since. She must have taken in enough information from the scene to identify it without actually seeing the primary item, the phone booth. Simply amazing.
The computational capacity is quite awe inspiring. I am certain it is related to the power of forgetting combined with our focus of attention. Right now, as a computer, it is cheap never to forget any sensor data, and you wouldn't even know what to throw away...
Comments:
Post a Comment