Peter Norvig at the BYU Family HIstory Technology Workshop
16 March 2006 00:32
Thursday of last week I attended the annual Family History Technology Workshop at BYU. As usual the conference was excellent and that combined with the Computerized Genealogy conference that followed on Friday and Saturday left my brain overflowing with things to blog about.
Life being what it is, I finally have a little bit of time to do a brain dump of those three days. There was the small matter of buying a new car to meet the needs of our growing family (#4 is on the way soon) to take care of early this week. We ended up buying an '02 Honda Odyssey in beautiful condition. I anticipate lots of fun road trips to come.
But getting back to genealogy, I imagine the slides from each presentation will find their way online. I'd like to set up some pages about the workshop over on WeRelate.org, one for each talk to hopefully foster more discussion among the community (and we need to get more people comfortable using WeRelate.org :-).
Thursday morning started bright and early with a keynote address by Peter Norvig from Google. Peter spoke about "The future of Search". Peter reviewed all the recent fun stuff Google has been doing. He spoke about the tagging movement, and how tagging is meaningful to individuals and to small groups, but not necessarily to the community as a whole. He used the example of the Squared Circle group on Flickr, which has meaning to a larger group, but that's not generally the norm. One simply has to look at the wide variety of tags used on say, del.icio.us, to see this at work.
He asked the question, what advantage do we get from tags?
He spoke of communities, like that surrounding Wikipedia, and reinforced the idea of natural communities. If you want to build a community your best shot is to target these natural communities. Along with that idea, harness the power of normal usage of your product. He pointed to Google Suggest as an example of this. I think Amazon is probably the king of this kind of thing (people who bought X also bought Y, if you're interested in A, you'll probably also like B, and so forth).
My question here is what is the data we can harness out of normal usage of online genealogical tools? Online tools should be telling me, "Hey based on what you've been searching for, you might also be interestedin these things. I noticed you downloaded a GEDCOM for this tree, these other people also downloaded the same thing and are interested in sharing research. You've spent a lot of time researching in this particular set of records, here are some other folks who have also een here too. And so on. Ancestry.com has taken some steps in this direction, but I think there's so much more that could be done. There's likely a lot of low hanging fruit in this area.
Peter spoke of Knowledge Engineering in the AI field, of building vast storehouses of common knowledge, things like "Water flows downhill." He said it used ot be that you couldn't look this kind of stuff up, but that's changing.
He spoke of extracting semantics from the web, and the challenge of doing so from non or semi-structured data. This point, I think, generated a lot of discussion for the day, about data formats. In the Q&A portion at the end of his talk, the question of searching databases, the "deep web" was brought up, and how best to address that. I asked aobut whether Google had any plans to support microformats, his response was that they hadn't, there simply aren't enough pages out ther with them to bother. But if enough pages started to use them, then they would probably consider it. That brings us back to the question of data formats and consensus, and the question of whether or not embedded semi-structured data is really needed, when so much semantic meaning is already to be able to extracted without such things.
The hard part here, he said, is that coming to consensus on such things is the hard part. Again, you have to go back to what makes a successful community. He believes that successful communites come when the barrier to entry is very low. Don't make me do a lot of stuff (lots of registration screens) just to participate at the bottom level. But you also have to have the deep part for the fanatics (like me :-).
On the idea of coming to concensus, there has been recent discussion about the possibility of self-citing sources to be used in the LDS Church's image delivery system for all the microfilm that's getting scanned and indexed (more on that in later posts). My thought here is that whatever the Church decides to go with, the Church pulls enough weight that most vendors will fall in line with whatever the church offers. But let's hope we can all influence a good decision to be made there. Join the ongoing discussion over at the Taking Genealogy to the Common Person blog.
Stay tuned, there's a lot more info I'd like to write about.
On 28 July 2006 14:35 Bob Jonkman
On 28 July 2006 14:37 Bob Jonkman