FamilySearch Indexing
Posted on 12 September 2005 16:04 | Permalink
Last week at the 2005 Conference of the Federation of Genealogical Societies (FGS), the LDS Church announced (Deseret News Article) that it is currently underway in a project to digitize and index its entire collection of over 2 million genealogical microfilms, eventually making the content of those films online via the Internet.
I don't know about you, but online access to all those microfilms makes my heart beat a bit faster
To extract and index the information, the church will be using a volunteer-driven process somewhat akin to what I hoped for in this post earlier this year. Users will be able to use a Java program through their web browsers to extract information from digitized microfilm images.
Another article describes the process with a bit more detail.
If you're interested in participating, visit FamilySearch Indexing. More people working on this means more information available sooner (hopefully!).
Reader comments: 1
Shell scripting in cron environments
Posted on 24 August 2005 13:05 | Permalink
I'm going to publicly humble myself here in demonstating my capability to forget that which I've learned once before, for the purpose of hopfully never forgetting it again. I've been writing some backup scripts for Oracle in Bourne shell which dynamically create backup scripts for Oracle Recovery Manager (rman), an then invoke rman, feeding it the backup script that was just created. In testing these scripts I ran into the classic problem of a script which works fine from the command-line but not when invoked from cron. The script would run,
but rman would bail with odd, unhelpful errors.
This smelled suspiciously like environment variables not being set so my first course of action was to recognize that not everything in the environment from the command-line will exist in the cron execution environment, so I dutifully set environment variables in my script to provide things like ORACLE_HOME, ORACLE_SID, and PATH:
ORACLE_HOME=/path/to/oracle/product/oravers/
ORACLE_SID=DB_SID
PATH=$PATH:$ORACLE_HOME/bin
But this still didn't fix the problem. The script would run, but rman still behaved like it wasn't seeing the environment set like it wanted. So I looked into another backup script I had written months ago, and found my problem.
You see, the environment variables I set as above were fine for anything that needed to access them within the scope of the current process. The moment I forked off another process, or ran another command (like, say, rman), those environment variables would be invisible to any child processes created by my script. How to resolve this problem? Export the variables:
ORACLE_HOME=/path/to/oracle/product/oravers/; export ORACLE_HOME
ORACLE_SID=DB_SID; export ORACLE_SID
PATH=$PATH:$ORACLE_HOME/bin; export PATH
By doing this, these variables were then visible to any child processes (like rman) that I might run from my script.
Well-seasoned shell scripters will smile at me and say, "well, duh!" In my defense, most of my scripting these days is in Perl, but at any rate, I hope that bloging about this will cement it in my mind enought to remember it 6 or 12 months down the road when I'm writing another similar script.
Reader comments: 2
Oracle Instant Client
Posted on 26 April 2005 16:07 | Permalink
A co-worker today asked me if there was an Oracle client available for Mac OS X. I was skeptical, but after a little digging came up with Oracle Instant Client, in which Oracle provides client libraries, ODBC drivers, development files (makefiles and header files), and sqlplus for a variety of platforms, including Mac OS X (Alas, no FreeBSD, however). They even provide rpms for Linux systems. And all this for free!
Hot Dog!
Up until this time I had been rolling my own rpms using tarballs made from manual client installs. This will make things very nice.
Oracle provides a fantastic database engine, but their install process is a royal pain. This is a great step in the right direction. Can we get the same now for the server install???
Reader comments: 0
Nagios Configuration
Posted on 08 April 2005 20:19 | Permalink
http://nagios.sourceforge.net/docs/2_0/xodtemplate.html
I'm posting a useful (for me anyways) link to the "Template-Based Object Configuration" for the Nagios host monitoring system.
Over the last few years in my job, I've been developing (in my "spare" time) a host management system, useful for storing information about the machines I manage. This grew out of an effort to have a system that would auto-generate kickstart files for RedHat Linux installations using PXE, which is does quite well. Eventually I'd like to open-source the project.
I've been thinking in the past while it would be nice to add a monitoring piece to this, and so I've been reading over the Nagios docs and reading about the "objects" it uses in its configuration. It doesn't look like it would be necessarily "easy" to have Nagios use a database for a backend, so perhaps an easier way to do this would be to have my system dynamically spit out a Nagios config based on the information on the database.
Why not just use the existing Nagios CGI frontend, you ask? I personally loathe it, and that I can't add hosts, services, and such via that CGI front-end (as far as I can tell) only makes it worse.
Reader comments: 0
Distributed genealogical record extraction
Posted on 29 March 2005 00:38 | Permalink
Last week I attended the Family History Technology Workshop at BYU, and came away with my head abuzz about all the cool things underway in the genealogical community in terms of emerging technologies.
A common thread through several of the presentations was that of data extraction and indexing. There is so much genealogically rich information out there that sits unavailable to most on dusty archive shelves. There are two main processes that separate that data from the rest of the general Internet audience:
You first have to scan or image the "offline" materials to make them available online. Once they're in a digital format they then need to be transcribed or "extracted" so as to make the information indexable and ultimately searchable.
An idea I had with regards to the extraction step is inspired by the Project Gutenberg's Distributed Proofreaders. The Distributed Proofreaders website enables anyone with a web browser and an Internet connection to help add new etexts to the Project Gutenberg archive by proof-reading small sections of OCRed text. The user logs into the website, is presented with the OCR image and a text box. The user then makes any corrections to the text and submits the form.
So if we take that idea and apply it to genealogical records I think we have the potential to make a lot of information available online that was previously "locked away" (so to speak) in microfilm or other "offline" media.
At the conference, a presenter from the LDS Church's Family History Department spoke about digitization efforts underway at the church. My dream would be to see the entire collection of the church's microfilm records available for viewing online. But as I mentioned above, digitization only gets you half way there. Once digitized, the church will have a large amount of data that will need to be extracted from the digital images so as to be indexable and searchable. That's where the distributed proof-reading comes in.
You could apply the PGDP approach and allow volunteers to sign in to a website where they would be presented with a scanned image and a text box in which to enter a transcription of the text found in the image.
Now let's take that idea a step farther and instead of waiting for users to remember to login let's proactively send them an email daily (or at whatever rate the user prefers) with the next image for them to transcribe and an HTML form in which they can put the transcription. Or perhaps provide a customized RSS feed to which they can subscribe with their newsreader which would provide the same thing on a recurring basis. My wife suggested the wise addition of a rate-limiting mechanism by which a new email or feed item would only be "sent" upon completion of the "pending" item. That way you wouldn't get a whole lot of these things stacked up in your inbox.
Reader comments: 1
MySQL replication woes
Posted on 10 March 2005 15:48 | Permalink
I've been fighting with a MySQL replication pair for a shared-hosting environment (multiple users each with their own database on the machine). The problem is that the replication keeps stopping because of regular old user errors such as trying to create a table that already exists, insert statements with fields that don't exist and so forth. So a user tries to create a table that already exists on the master, the erroneous statement is propagated over to the slave which throws its hands up and refuses to do any further replication until an admin comes by to manually fix the problem. Uggh.
So I run on over to mysql.com and consult the docs to see if this is "normal" behavior. Looking at the docs on replication "features and known problems" (http://dev.mysql.com/doc/mysql/en/replication-features.html) I find the following:
"If a statement on the slave produces an error, the slave SQL thread terminates, and the slave writes a message to its error log. You should then connect to the slave manually, fix the problem (for example, a non-existent table), and then run START SLAVE."
*sigh*
One of the things that I really dislike about MySQL is how they tout these cool features, but once you get into the nitty-gritty, you always seem to come up with "features" like these that leave a sour taste in your mouth. Want to use their clustered database? Now we can compete with Oracle's Real Application Clusters, they say. Oh, but you have to store your entire database in memory across the cluster. Sour. To be fair I hear they're working on a system where such is not a requirement, but it's still frustrating to get excited about such a system, only to run into these limitations.
My guess is that in the case of replication this is a result of MySQL's replication being logical replication (i.e., just replaying a bunch of SQL statements on the slave) as opposed to a physical replication (keeping track of transactional changes, before and after data and applying those to the physical structures of the database. (I borrow the terms "logical" and "physical" in this instance from Oracle's hot-standby terminology). Working with Oracle databases in physical standby environments I have never run into a problem like this. It "just works". I wonder if this does turn out to be a problem in Oracle's logical standby mode, however. Knowing Oracle's flexibility, I'd wager you can configure whether or not to have the replication die on such a situation.
Bad MySQL, no biscuit :-(.
Update: After some more consultation with the docs it turns out you can use the "--slave-skip-errors= [err_code1,err_code2,... | all]" option to specify a list of errors the slave should skip should it encounter them. So that fixes my problem temporarily, as the number of distinct errors I'm running into aren't that numerous. But in enabling this option, it leaves me with the nagging question in the back of my head, are my databases really in sync? I'll probably end up writing some scripts to crawl through the (hundreds of) databases in the master and slave instances to see if any serious 'diffs' exist. I wish MySQL had a physical replication mode (as opposed to a logical mode) so I wouldn't have to deal with this kind of thing.
Reader comments: 2
Not your ordinary boring default outage page
Posted on 24 February 2005 22:21 | Permalink
Went to flickr tonight, found this, laughed out loud:
Flickr is just too cool.
Reader comments: 2
Wikis and scripture study tools
Posted on 16 February 2005 01:07 | Permalink
I wanted to put down some ideas for a scripture study tool. I envision a tool that lets me browse through any number of "books" (Bible, Book of Mormon, Doctrine & Covenants, Apocrypha, History of the Church, etc.), and place "footnotes" anywhere I wanted to in those volumes.
The tool would allow me to browse through each volume, reading like a book, and do various forms of keyword searches. Any footnotes I have added would be displayed at the bottom of each "page" of text, with a superscripted number or letter in the text directing me to the footnote.
I envision this tool working like a Wiki does. For each "page" of text, I could hit the "Edit" button and at the point of interest in the text, add some WAFL code containing the footnote, hit save, and be on my way. The display engine in printing each "page" would auto-calculate footnote numbers/letters, and display the corresponding set of footnotes for each "page" in question.
Along with each footnote I would associate a date/time. This would allow me to browse the tool from the point of view of the footnotes, which would provide a "scripture journal" view of sorts. Each footnote would indicate the "page" and/or location in the volume that it came from.
Being able to reference "verses" in footnote text would be useful too.
The technology to do this is readily available. I could see a Kwiki plugin handling the wafl code. Whether or not to make it a public "wiki" would be up to the person putting it together.
Just an idea for the meme pool.
Reader comments: 0
Color Scheme Generator
Posted on 11 November 2004 11:00 | Permalink
On occasion in designing websites I've looked for sites that provide various color schemes, to take the guesswork out of making a site look nice from a color standpoint. Eric Meyer blogged this tool today in his recent links section. Very nice.
Reader comments: 0
Google horsepower
Posted on 06 October 2004 14:52 | Permalink
Wow. This blog post discusses some of the aspects of Google's operations. Lots of cool things to think about for a system administrator.
Reader comments: 0
<- Prev 10 | Next 10 ->
|