We've reached some milestones in the last week. First, I've added 17 new languages to the site since the last update, so there are now 71 supported languages in all, more than twice the number we started with three weeks ago. Again Michael Bauer helped with several of these, and I also had a number of people write to me after the BBC interview asking if I would support their language. Here's the full list of new languages:
Among these are our first indigenous Australian language (Gamilaraay, with 3 speakers according to
Ethnologue) and two other critically endangered languages: Ainu (~15 speakers in Japan), and Nawat (~20 speakers, all older). Thanks to
Alan R. King who provided training data for Nawat and who is responsible for the first couple of tweets in that language.
We also have a number of new translations. The first round of translations came mostly from friends working on the Firefox localization teams. Many of these new translations are directly from members of supported language communities on Twitter: Rumantsch (Gion-Andri Cantieni,
@gionandri), Setswana (Sternly Simon,
@talk2ras), Kɨlaangi (Oliver Stegen,
@babatabita), Occitan (Maxime Caillon,
@caillonm), Kernewek/Cornish (John Gillingham,
@Bodrugan), Brezhoneg/Breton (Ahmed Razoui,
@duzodu), and Nawat/Pipil (Alan R. King,
@alanrking). We also have a translation into Marshallese from Marco Mora, but no tweets in that language yet!
One additional milestone. The site is generated by using a program that "crawls" Twitter users, grabbing the tweets on their timeline and performing statistical language recognition on those tweets (details to come). Then, if a given user has more than a certain fraction of their tweets in the target language, that user's followers are added to a queue to be checked in the same way. In the last couple of days, the initial crawls for Basque and Welsh were completed, meaning all languages, with the exception of Haitian Creole, are now complete. Therefore the number of users currently listed for each language should represent a good initial estimate of the total user base on Twitter. Of course the program will continue to add new users as they are discovered by the crawler (through random search queries for words in each language) and as they are suggested via the form on each language page on
IndigenousTweets.com.
Haitian Creole is a special case and will remain so. As noted in an earlier post, we expect there are at least 100,000 people tweeting in Creole and it is unlikely I can keep up with all of them given the limits imposed by Twitter, but I will do my best.
Next milestone: 100 languages!