Book Reviews   Digital Libraries   Astronomy Log   Software   About  

The robotic mule
Posted on 23 March 2006 00:01 | Permalink

Wow. Two summers ago I hiked into the High Uintahs of eastern Utah with a troop of Scouts. My pack on the occasion was well over 50 pounds, and I was rather out of shape for such an adventure. Next time I'm taking one of these:

As one blogger put it, it's eerily lifelike. It needs a head, and a big grin. Those poor scouts who had to suffer through some part of their scouting career as part of the Pedro Patrol:


(img src'd from http://www.geocities.com/t34nwtc/ppatches/)

(/me raises hand) no longer have to be embarrased by such a goofy mascot :-). Add a couple of sidearms and you have the terminator burro...

Man, I shouldn't blog this late at night ;-).

Reader comments: 0


RSS feeds now available for parts of lds.org
Posted on 17 March 2006 15:47 | Permalink

I received this on the LDS-GEMS mailing list today:

"The Church has recently added RSS capability to several of its key Web pages. To learn more, go to www.lds.org/rss, which contains more detailed information and a continually updated list of the Church's RSS-compatible Web pages."

Woohoo, the feed bug is spreading! (A healthy infection, in my opinion). Now we just need to see it popping up in the Family History department. At the FHT workshop and the CGC last week I did my best "pass it along" to those in positions of influence :-).

More info here: http://www.lds.org/library/display/0,4945,6606-1-3386-1,00.html

Reader comments: 2


A smile is universal
Posted on 17 March 2006 11:13 | Permalink

Not sure quite what it is about it, but I love this picture:

http://www.flickr.com/photos/raul/41476707/

Reader comments: 0


More thoughts on the potential Granite Mountain Vault images API
Posted on 17 March 2006 00:23 | Permalink

This is a followup to my previous entry about the possibilities of an API for accessing the images and information from the microfilms in the Granite Mountain Vault.

In that entry, I wondered about the value added by an API, over just having permanent links. Today I thought, there would be a lot of value if such an API also exposed the data being extracted by the FamilySearch Indexing project, such as field names in any given image, and values for each of the fields. Search functions via the API which searched over those fields along with other meta data (time period, geographical location, etc.) could be very useful indeed.

Reader comments: 3


Microsoft Live Clipboard and genealogy
Posted on 16 March 2006 17:07 | Permalink

John Udell writes today about Microsoft "Live Clipboard", and the idea is very compelling. Being able to cut/copy/paste between desktop apps and objects on web pages and having it all do "the right thing" is very compelling.

Of course I have to put the genealogy spin on things. I'd like to be able to go to a site that lets me browse pedigrees and copy and paste individuals (or whole trees for that matter) into my desktop record manager, be it PAF, Legacy, Roots Magic or whatever. I'd like to be able to go to the LDS Church's new microfilm image delivery system and able to copy and paste images as sources into my desktop record manager (without having to right-click, Save image as, etc., etc. What about GEDCOM upload/download? Well, that could be happening in the background, but the cut/copy/paste comes much more naturally to me, and, I'd wager, to a lot of folks.

Reader comments: 1


101 things I could/would do with all the images from the Granite Mountain Vault
Posted on 16 March 2006 01:17 | Permalink

This actually comes later in the sequence of Thursday's conference, but I want to get it out so I can point some folks to it. Shane Hathaway gave a presentation about his work on a long-term storage system for all the images that will be generated from the microfilms in the Granite Mountain Records Vault. More details from that in a later post.

There are endless 'cool' (not to mention useful) things that could be done with the images in the vault if the delivery system built by the Church includes a web services API. I can imagine a number of 'meta-databases' growing up around this collection of images (a la del.icio.us and the web).

As one idea, a simple tagging database could be built that could associate any given set of tags with any set of images by user. If each image has its own permanent URL (I surely hope so!), then this kind of database would be trivial to implement, and almost wouldn't even require an API. Taking the idea a step further, I imagine a meta database that would allow me to tag and link to regions of images, as specified by X,Y coordinates. This would let me link to a specific line in, say, a census record.

I'm wrestling with what specific value a web services API for the image delivery system would provide, if each image had its own specific URL. There's a lot of power in simply being able to reliably link to any given resource on the web. Perhaps the real value would be in a web service that provided metadata details about each image (which microfilm it came from, the sequence number within the microfilm, which geographic locations are covered in the image, which time period is covered in the image, all the publication and citation information from the associated Family History Library Catalog entry, and so forth).

Some ideas I posted on the LDSOSS mailing list a while back included the following:

"one could build a service around this hypothetical API such that users could create RDF 'semantic-web' data associated with [or extracted from] each image, such as 'John Doe was born in Sussex, England in 1815', which would then be machine-readable, and much in the same way proofs can be deduced in a relational database based on the basic information stored in relations. Imagine being able to query 'show me all images for a John Doe born in Sussex, England within 5 years of 1815'."

But really, such could be done by simply linking to a permanent URL. Imagine if in parallel with the FamilySearch Indexing work going on to extract the literal text from these images, someone (like me?) built a database system/service that allowed a community to extract and associate semantic data (stored as RDF) with any image in the system. The church wouldn't need to develop this, all that would be needed would be permanent URLs for the images (but an API to grab all the metadata would be very helpful, too).

What kinds of things would you like to be able to do with an API for this system?

As a final thought, I really hope an RSS/Atom feed of new images coming online is made available. A customizable feed that lets me indicate I'm interested in these specific geographic regions and these time periods, and then gives me a custom link to a feed for images matching these criteria would be utopia over the next few years.

Reader comments: 0


Peter Norvig at the BYU Family HIstory Technology Workshop
Posted on 16 March 2006 00:32 | Permalink

Thursday of last week I attended the annual Family History Technology Workshop at BYU. As usual the conference was excellent and that combined with the Computerized Genealogy conference that followed on Friday and Saturday left my brain overflowing with things to blog about.

Life being what it is, I finally have a little bit of time to do a brain dump of those three days. There was the small matter of buying a new car to meet the needs of our growing family (#4 is on the way soon) to take care of early this week. We ended up buying an '02 Honda Odyssey in beautiful condition. I anticipate lots of fun road trips to come.

But getting back to genealogy, I imagine the slides from each presentation will find their way online. I'd like to set up some pages about the workshop over on WeRelate.org, one for each talk to hopefully foster more discussion among the community (and we need to get more people comfortable using WeRelate.org :-).

Thursday morning started bright and early with a keynote address by Peter Norvig from Google. Peter spoke about "The future of Search". Peter reviewed all the recent fun stuff Google has been doing. He spoke about the tagging movement, and how tagging is meaningful to individuals and to small groups, but not necessarily to the community as a whole. He used the example of the Squared Circle group on Flickr, which has meaning to a larger group, but that's not generally the norm. One simply has to look at the wide variety of tags used on say, del.icio.us, to see this at work.

He asked the question, what advantage do we get from tags?

He spoke of communities, like that surrounding Wikipedia, and reinforced the idea of natural communities. If you want to build a community your best shot is to target these natural communities. Along with that idea, harness the power of normal usage of your product. He pointed to Google Suggest as an example of this. I think Amazon is probably the king of this kind of thing (people who bought X also bought Y, if you're interested in A, you'll probably also like B, and so forth).

My question here is what is the data we can harness out of normal usage of online genealogical tools? Online tools should be telling me, "Hey based on what you've been searching for, you might also be interestedin these things. I noticed you downloaded a GEDCOM for this tree, these other people also downloaded the same thing and are interested in sharing research. You've spent a lot of time researching in this particular set of records, here are some other folks who have also een here too. And so on. Ancestry.com has taken some steps in this direction, but I think there's so much more that could be done. There's likely a lot of low hanging fruit in this area.

Peter spoke of Knowledge Engineering in the AI field, of building vast storehouses of common knowledge, things like "Water flows downhill." He said it used ot be that you couldn't look this kind of stuff up, but that's changing.

He spoke of extracting semantics from the web, and the challenge of doing so from non or semi-structured data. This point, I think, generated a lot of discussion for the day, about data formats. In the Q&A portion at the end of his talk, the question of searching databases, the "deep web" was brought up, and how best to address that. I asked aobut whether Google had any plans to support microformats, his response was that they hadn't, there simply aren't enough pages out ther with them to bother. But if enough pages started to use them, then they would probably consider it. That brings us back to the question of data formats and consensus, and the question of whether or not embedded semi-structured data is really needed, when so much semantic meaning is already to be able to extracted without such things.

The hard part here, he said, is that coming to consensus on such things is the hard part. Again, you have to go back to what makes a successful community. He believes that successful communites come when the barrier to entry is very low. Don't make me do a lot of stuff (lots of registration screens) just to participate at the bottom level. But you also have to have the deep part for the fanatics (like me :-).

On the idea of coming to concensus, there has been recent discussion about the possibility of self-citing sources to be used in the LDS Church's image delivery system for all the microfilm that's getting scanned and indexed (more on that in later posts). My thought here is that whatever the Church decides to go with, the Church pulls enough weight that most vendors will fall in line with whatever the church offers. But let's hope we can all influence a good decision to be made there. Join the ongoing discussion over at the Taking Genealogy to the Common Person blog.

Stay tuned, there's a lot more info I'd like to write about.

Reader comments: 2


Self-citing sources for genealogical research
Posted on 07 March 2006 01:29 | Permalink

Today on the LDSOSS mailing list, Dan Lawyer sent a link to his new blog, "Taking Genealogy to the Common Person", the purpose of which is described thus:

A clear majority of people on this earth want to know more about their ancestors. In spite of their innate interest, they are often overwhelmed at the complexity of the process and underwhelmed by the experience. This blog is a forum for promoting innovation that will help to take family history to the common person.

A worthy cause indeed. His inaugural post brings up the idea of self-citing internet sources in genealogical research. He proposed some kind of tag markup in a given page indicating source information for the image or content displayed on that page, and opened the challeng out to standardize some kind of universal format. He places all this in the context of the digitized image delivery system the LDS Church is building to provide online access to digitized versions of it's more than 2 million microfilms.

I commented that microformats might be a good candidate to handle this kind of thing. On the microformats wiki, there is already a potential citation microformat that may be just what we're looking for.

In my comments, (which I include, slightly edited, below), I brainstormed several possibilities these self-citing pages could provide. Exciting stuff indeed.

My comments:

I think microformats might fit the bill for what you're looking for here:

"Designed for humans first and machines second". Microformats are essentially snippets of structured (X)HTML that are machine-readable, embedded in pages that are human-readable.

It appears there is already work in progress to develop a citation microformat: (http://www.microformats.org/wiki/citation.

I can think of a number of ways this kind of thing would be useful. As one example, imagine browsing through all the digitized images and being able to click a browser bookmarklet (http://en.wikipedia.org/wiki/Bookmarklet) that does something like "Associate the source document displayed on this page with an individual/marriage/event/etc in my account in the church's new FamilyTree system." Clicking the bookmarklet would scan the current page for the microformat, and then lead you to a page in the FamilyTree system that would let me select the individual/event/marriage with which to associate the image as a source.

Of course, the church could also just put a link on each image display page that does just that, but a microformat would allow systems from different vendors to interact with these sources. A genealogy program could allow me to paste in a URL from which it could automatically extract the source information. Or going one step further, genealogy tool providers (RootsMagic, The Master Genealogist, Legacy, et al.) could provide browser plugins or browser toolbars that automatically detect these self-citing pages, and which could offer actions similar to the example above.

Now if the ContentDMs (digital library software used by BYU and many other digital libraries) of the world could do similar things with their image display pages, we'd really be moving somewhere.

I think if the chuch, with its weight and influence were to adopt such a standard, we'd potentially see a lot of other digital content providers begin to follow suit.

As to the concern of having to "load the whole document" just to get the source citation, HTTP allows one to only fetch a page's text content without also fetching the pages images, so I don't see this as (too big of) a problem, unless the text on the page itself is also very large. In a digitized image delivery system, I think the pages would be fairly lightweight as far as the text goes.

And why stop with just citations? One possibility is to create a "GEDCOM microformat" in which linkage and other genealogical data can be embedded in machine-readble forms in human-readable pages. (http://www.microformats.org/wiki/genealogy-formats) If each software program and each online family tree system that display pedigrees and other genealogical information were to include these microformats, there's a huge number of possibilities for how such a format could be used:

- Tools vendors could again provide bookmarklets, toolbars, or browser plugins to do things like, "Import the individual on this page (and all their ancestors) into my genealogy database." Clicking on such a button/link would popup the user's genealogy tool of choice, which would pull down the page in question, parse the microformatted data, and chase the resulting tree.

- Plugins could be provided to display lists of individuals/marriages/events/sources that are on the currently displayed page in a sidebar, each with options to import and/or process with the user's tool of preference.

- Search engines and aggregators could do automated match/merge of individuals they parsed out from spidered pages, offering suggestions to their users as to pages related to their research.

Again, if the church were to get behind this kind of effort, starting with (the publically accessible) portion of the new FamilyTree system, a lot of other vendors would soon follow.

Let's do it!

...

Some more thoughts:

Some advantages of microformats: - They're invisible to the average user - Yet they provide so many possibilities to tool providers and geeks like me (and through them/us to the masses via the tools they/we build) - I don't see them as being that difficult to add/implement in most of the tools that are out there that generate HTML, once we have a good standard established (that's the hard part :-).

One more application idea along the lines of a "GEDCOM microformat":

Imagine if the church's new Family Tree system published RSS/Atom feeds of activity happening on people's trees. Each time a person was added, for example, an item could be added to an feed for that tree or account (if such info was safe to do so, e.g., the new person was not living). And if those feed items embedded these microformats, I could subscribe to these feeds with my aggregator (such as bloglines.com). As new individuals were added to my trees of interest, they would show up in my aggregator, and the lovely browser plugins (this assumes people are using web-based aggregators) would pick up on these microformats in the feeds I'm looking at, offering all the options to import, etc.

Dan responded, saying

Microformats looks like a possible option for what we need to do. I'll spend some time learning about it.

I've thought about the value of having RSS or ATOM feeds from people the Family Tree. Seems like a powerful concept. There is a question of granularity on such a feed. Is it scoped to a person, family, family line, n number of generations, etc.?

The microformat concept for genealogy has some potential also. It would definitely need to be coupled with a citation capability otherwise there's a risk that we make it easier to propogate unsubstantiated pedigrees.

To which my response was:

Offer flexible levels of granularity, a la del.icio.us or flickr.com.

For example, on del.icio.us (a social bookmarking site), I can subscribe to feeds of: - all recent urls being submitted - all popular urls being submitted (urls that are getting submitted most frequently) - urls being tagged by a specific tag (e.g., 'linux') - urls being submitted by a particular user - urls being tagged by a specific tag by a specific user

and so forth.

For an example of a usage of that last feed, one of the sidebars on my website brainshed.com is generated by slurping down and parsing the feed of all urls I have submitted and tagged with 'perl_module'

Flickr offers similar functionality for various aspects of their photo service.

As another example for feed possibilities, Yahoo provides RSS feeds for search results. So I can subscribe to an RSS feed based upon the search results of say 'hanks genealogy', and theoretically, (although it doesn't quite work like I'd like) be notified any time a new search result pops up for those search terms.

So, for FamilyTree, it would be fun to have: - an RSS feed for all changes being made by a particular user (given the user's permission to do so, etc) - a feed for all changes to a particular individual or set of individuals - a feed for all changes to an individual or any of his ancestors for N generations - a feed for all changes to an individual or any of his descendants for N generations - a feed for any sources that are added to an individual, a set of individuals, an individual and his descendants, an individual and his ancestors, etc, etc. - A feed for all new digitized images coming online (i.e., one entry in the feed for when images from FHL #123456789 become generally available) - A personalized feed for any disputes that are submitted for info in any of my lines. - And so forth :-).

Now, I don't envy the developers who have to build the backend for such a system, but I don't see it being too hard. Somehow you log changes being made in the system, and for each change you determine which feed interests (see the list above) that change would apply to (you'd also have to determine if the change is a private change, and shouldn't be made publically available. Then you'd have an application/CGI/etc to then take incoming HTTP requests for feeds and dynamically determine which of the change events need to go in each feed requested (with plenty of caching involved, of course).

Granted that's probably a simplistic view of what would be needed to implement such a system, but I hope you get the idea. Make the set of feeds available infinitely (or nearly so...) customizable, and we'll all probably be surprised at the varety of uses that arise from the availability of these feeds.

This is getting long, but I wanted to include this all here, as I've been giving a lot of thought to this kind of stuff lately. I really like the ideas behind sites like edgeio.com and inods.com, where instead of one player (like Amazon) holding all the data, we each own our own data, and host it wherever we want, and it is then spidered and aggregated into useful tools. I'm not saying I don't like Amazon (I love Amazon!), but I do like the idea of being able to keep all my data in one place, instead of having to post reviews here, as well as on Amazon. If Amazon wanted to be really bold, it could do its own aggregation a la inods, and add lists of links to reviews from blogs on its product pages. They would loose an amount of editorial control that they now exert, but would gain, in my opinion, some very good content.

Applying these datalibre ideas to genealogy, I could host my genealogy data wherever I want (even on the new FamilyTree system), and if were all microformatted, then any search engine or aggregator or whatever tools we haven't thought of yet, could still use it in meaningful ways.

Reader comments: 0


Utah Digital Newspapers
Posted on 15 February 2006 17:27 | Permalink

Utah Digital Newspapers

Wow. This site makes available digital versions (fully searchable) of past issues of a number of Utah newspapers, including the Deseret News from 1850-1898. If you have ancestors in early Utah, this is a gold mine.

Reader comments: 0


OpenQRM - Open Resource Manager
Posted on 01 February 2006 15:59 | Permalink

Found this via Inforworld today:http://www.openqrm.org/

At my current job, I have spent a fair amount of time developing a somewhat similar system, which I call the "Host Database". It stores information about servers and network gear, storing all the info in a Postgresql backend, with a Mason frontend. We use it to keep track of servers, their associated disks, network interfaces, and other related info but it can also catalog network subnets, track switch port connections and vlans, serve DNS (via MyDNS), and spit out kickstart files for automated installation (among other things). I'd love to open-source the thing.

I may spend some time with OpenQRM, and see if there are any tips I can take away from it. It appears with OpenQRM you can create custom boot images which can be served out to machines via PXE, allowing you to essentially build a diskless cluster.

Reader comments: 0


<- Prev 10 | Next 10 ->
Happiness
True love begins when the needs of others become more important than your own.
The practice of true love begets true happiness

Me

Daniel Hanks

I'm a system administrator working for Omniture

Interested in

perl
books
python
databases
genealogy
astronomy
digital archival
digital libraries
web applications
web infrastructure
distributed storage

among other things . . .

Storyteller


Pamela Hanks

is an excellent storyteller.

(She also happens to be my wife :-)

A storyteller makes a wonderful and unique addition to family, school, church or other group events. Schedule her for your next gathering.


Kiva.org
Kiva - loans that change lives

Recent Blog Entries

Subscribe with Bloglines
- OpenWest Conference 2014 Presentation Slides - Ansible
- OpenWest Conference 2013 Presentation Slides
- Utah Open Source Conference 2012 - Presentation slides
- E-Book Review: Data Mashups in R
- Book Review: Illustrated Guide to Astronomical Wonders
- Book Review: Wicked Cool Shell Scripts
- PLUG Presentation Slides: The Open Source Data Center
- Harnessing human computational power from computer games
- I love a good roadtrip
- FamilySearch Developers Conference 2008 presentations now available online
- FHT follow up: an idea for a mobile genealogical application
- Family history and technology: it's only getting better
- President Hinckley passes away
- December is NaBoMoReMo - National Book of Mormon Reading Month
- Family History, Photos, Blogs, and Books
- The Compact Oxford English Dictionary
- 1830s English and the Book of Mormon
- Google adds My Library feature to Book Search
- Utah Open Source Conference
- Wiki diagrammer (Steal this idea!)

All Entries . . .

LDSOSS
LDS Open Source Software
A website discussing the use of Open-source software for applications useful to those sharing values of the Latter-day Saint (Mormon) faith.

© 2009, Daniel C. Hanks