Browse

Search

About

Village DB: About

Welcome to the Roots Village Database, a digitization of the information from the Index of Clan Names By Villages published by the American Consulate General in Hong Kong in the 1970s. Originally used to investigate immigration fraud, this data is now valuable for genealogy research.

Currently, data entry for Toisan is complete. Hoiping data entry has just started. Please be patient; this is an entirely volunteer-driven project. Eventually we will have data from four counties, and all sorts of fancy indexes. :-)

News

2005 June 27 - The database has been moved to a new home under the Chinese Culture Center of San Francisco. Please let us know if there are any problems.

The data here comes from the Index of Clan Names By Villages. There are four books, one each for Toishan, Sunwui, Hoiping, and Chungshan. Eventually, we'll put up the introduction from those books, but for now this note should suffice:

Note for the Reprint Edition

Thanks to Him Mark Lai for wanting this to happen in the first place; to Beatrice Yu, for setting up the groundwork; and to Tony Tong, for coordinating the project in its current form.

What's here:

Frequently Asked Questions

Q: What does "Map Location" in the Heungs mean? Are there gonna be maps?

A: According to the introduction of the Index, the map locations are keyed to the grid coordinates of the U.S. Army Map Service Series covering Kwangtung Province. We don't have these maps handy, but eventually we'd like to get our hands on some sort of map. Meanwhile, you can try the following pages:

Siyi Genealogy
Maps of Taishan County

Q: Friggin it's been two years since the last update. What the hey?

A: This database project is entirely volunteer-driven, so progress isn't as regular as one might hope. Also, all the programming is done by one person, me, so my hit-by-a-bus factor is rather high. As it turns out, in June 2002 I began suffering from RSI, or repetitive strain injury, which I'm still dealing with now, though fortunately I can type again. Also, during the last school year I was out of the country (in Taiwan), which also made it difficult to work on the database. But don't worry, I'm back now.

Moral of the story: DON'T IGNORE WRIST PAIN! Rest often. Improve your posture. Drink water. Stretch, exercise, get moving. Believe me, it sucks when it hurts just to write in your journal, or use chopsticks.

read more about RSI
Typing Injury FAQ

Technical Notes

This site requires CSS and cookies enabled, and prefers JavaScript to be on.

For those of you who are curious, the database is running on MySQL as a backend, and perl cgi scripts for the interface. MySQL is fast, free (open source), and can handle lots of data. Perl is just cool.

MySQL
Perl

The database is being developed on Mac OS X, with the help of BBEdit, CocoaMySQL, and Safari.

Mac OS X
CocoaMySQL
BBEdit

Really Technical Notes

The Chinese characters on these pages are encoded in Big5-HKSCS, which is Big5 (the standard encoding for traditional Chinese) plus a bunch of Cantonese-specific characters that the Hong Kong government added on. (Big5 itself encodes 13,060 characters or so.) Input of Chinese is done through STC, or Standard Telegraph Code, which maps a 4-digit code to a character. Apparently, there are two different telegraph encodings, one for Taiwan and one for mainland China. The version used in our data is of the mainland variety, and apparently can be found in a book entitled 《電報明碼》. Naturally, this book is nowhere to be found (I haven't had the chance to beam over to Hong Kong and search the large bookstores there [update: I have, and it's still nowhere to be found, though I didn't have time to search the big libraries there]), and the various tables out on the internet are rife with mistakes. The telegraph data that this database uses is culled mainly from information put together by the Unicode people. Thank you, Unicode. This data, combined with a couple of other sources, gives us a telegraph code table of 7977 characters, which still appears to be missing a few. If anyone knows where I might find a more complete table, or the book, please let me know (email at bottom of page).

Unicode Data - look for Unihan.txt

In addition, the database uses pinyin (Mandarin) and jyutping (Cantonese) romanizations, also looked up from tables. The pinyin table has information for 13,024 characters. The jyutping table, provided by the LSHK (Linguistic Society of Hong Kong), contains 10,675 characters. For characters where more than two pronunciations are possible, I've tried to choose the most likely one. Let me know if you run into problems.

Chih-Hao Tsai's Technology Page - the source for the pinyin table
LSHK

Links

Toisanese language

Version History

0.95 - 2005.01.29
  • added dynamic show/hide of pinyin, stc, etc.
0.94 - 2005.01.10
  • fixed an unfortunate bug which caused searching of village names not to work at all
0.93 - 2004.11.22
  • fixed an obscure encoding problem which prevented search of surname Hui (it's actually an obscure mysql bug, i think)--thanks to Warren Huie for alerting me to this problem
0.92 - 2004.06.26
  • HTML redesign
  • long lists now sorted
0.91 - 2004.02.01
  • added name search to village
  • pinyin display
0.9 - 2004.01.27
  • improved search - searches surnames by chinese character instead of romanization
  • village search now shows enclosing County, Area, and Heung
  • selecting surnames popup automatically searches; selection "sticks"
  • made data entry more reliable (error checks for empty fields, duplicates, etc.)
  • improved editing, added deletion of entries
  • fixed up some errors in the data
0.83 - 2002.03.01
  • added rudimentary search
0.81 - 2002.01.21
  • made title (of browser window) more descriptive
  • numbered listings
  • made multiple names (aka's) explicit and easier to read in listings
  • added surname to Village listings
0.8 - 2001.12.08
  • first public release, for the first data entry party

Fine Print

Limited time offer. Not available in stores. Void where prohibited. Not for children under the age of 5. Do not fold, spindle, or mutilate.
last modified 2005 June 27 by Dominic Yu | contact: dyupc@blyt.net