15 more languages

    Thanks to the expert help of Michael Bauer (akerbeltz), Indigenous Tweets now supports 54 languages!   Michael combined his broad knowledge of indigenous and minority languages with many hours of searching on Twitter (even burning through his daily quota of searches on the web site), and was able to find users tweeting in the following languages: Kalaallisut/Greenlandic, Tahitian, Bislama, Sardinian, Corsican, Aymara, ᏣᎳᎩ/Cherokee, Kabyle, Rumantsch, Tongan, Eʋegbe/Ewe, Míkmaq/Micmac, Walon, Asturian, and Yucatán Maya.  I used his suggestions as starting points for the crawler and have turned up additional users in many cases.

    I welcome contributions of this kind in the comments.   Just give the language name (in English and in the language itself if you know it), the ISO 639-3 code, and a list of Twitter usernames and I will add them to the site.

    Go raibh míle maith agat a Mhíchíl!


How many languages are out there?

     I added two more languages over the weekend: Inuktitut/ᐃᓄᒃᑎᑐᑦ, thanks to some prompting from Tim Pasch, and Rangi/Kɨlaangi thanks to Oliver Stegen who did what we think are the first tweets in that language.  As the site is set up now, it will only detect Inuktitut tweets written using syllabics, although if I have time I may extend it to find examples in Latin script as well.

    Two more language pages were translated over the weekend also: Chichewa, thanks to Edmond Kachale (who was kind enough to blog about us too), and Welsh, thanks to Carl Morris, Rhys Wynne, and Gareth Jones.

    Just how many languages are out there on Twitter?  This is a question I've been exploring for many years in the broader context of the web, where my Crúbadán web crawler has found documents written in almost 500 languages.   Those texts are used to train the language recognition algorithms that drive IndigenousTweets.com (I'm planning a blog post on the details of the language recognition).   I could conceivably add any of these 500 languages to IndigenousTweets, with the following restrictions:

  1. Twitter limits the number of queries I can make to their API so I don't plan on adding any languages with Twitter communities that are any more active than the top languages I have now: Haitian Creole, Basque and Welsh.   It's even unlikely I can get everything in Creole; my friend Jean Came Poulard conjectures there may be at least a half a million people tweeting in the language.
  2. My language recognition algorithms work well at the level of full documents, but things are more challenging when working with tweets of 140 characters or less, and which often contain URLs, abbreviations, etc.   So many languages that I'd like to include are turning out to be very challenging, for example distinguishing the Filipino languages Cebuano, Tagalog, and Hiligaynon.
  3. Finally, my guess is that there is no one using Twitter in the vast majority of the other 400+ languages, at least not yet.   I should mention that I've set up IndigenousTweets for several other languages and made a non-trivial attempt at finding tweeters, with no luck: Aymara, Bislama, Kashubian, Marshallese, Pohnpeian, Sango, and Songhay.

Please keep the suggestions for new languages coming, and if you can point me to one or two people you know are tweeting in the language, that's a big help.


New Languages!

    Yesterday, I added Kernewek (Cornish) and Sámegiella (Sámi) to Indigenous Tweets, and was happy to see there are at least 27 people tweeting in Kernewek. 

    I also added translations of the Basque and Wolof pages, thanks to Julen Ruiz Aizpuru and El Hadji Beye for those.   I'd like all of the pages to be translated eventually; please get in touch if you're willing to help with that.  There are just 13 short strings to translate into each language:

  • Trending:
  • Anyone missing?
  • Twitter username:
  • Submit
  • User
  • Total
  • Followers
  • Following
  • Last Tweet
  • Thanks!
  • Invalid name.
  • Tell the world you're here:
  • I'm a top tweeter in my language!



    Beannachtaí na Féile Pádraig!  I've started this blog as a companion to my new web site IndigenousTweets.com.  Please check out that site, and I hope you'll also subscribe to this blog if you are a speaker of, or are interested in, an indigenous language.  I am planning on discussing best practices for developing basic language technologies like keyboards, spell checkers, etc. and I'll also be interviewing people from around the world who are using technology as part of language revitalization efforts.

Project Background

    Speakers of indigenous and minority languages around the world are struggling to keep their languages and cultures alive.  More and more language groups are turning to the web as a tool for language revitalization, and as a result there are now thousands of people blogging and using social media sites like Facebook and Twitter in their native language.   These sites have allowed sometimes-scattered communities to connect and use their languages online in a natural way.   Social media have also been important in engaging young people, who are the most important demographic in language revitalization efforts.  Together we're breaking down the idea that only global languages like English and French have a place online!

How to use IndigenousTweets.com

   The primary aim of IndigenousTweets.com is to help build online language communities through Twitter.   We hope that the site makes it easier for speakers of indigenous and minority languages to find each other in the vast sea of English, French, Spanish, and other global languages that dominate Twitter.

   The main page lists all of our supported languages (35 as of the launch).   Find your language in the table, click on the row, and you will be directed to a new page that lists (up to) the top 500 Twitter users in your language.   For instance, here's the page for Ojibwe/Anishinaabemowin.   This is meant to be a kind of "menu" of people who tweet in your language whom you might want to follow on Twitter.    If you click on someone in the table, it will open a new window or tab with their Twitter profile, so you can see some of their recent tweets and decide if you want to follow them or not.    The tables are sortable by any of the columns; this is useful for example if you (like me) only want to follow people who tweet primarily in your language - just sort by the % column.   Or you might be interested in the most popular tweeters in your language - in that case, sort by the "Followers" column.

    Another feature that I hope people will enjoy is the "Trending Topics" computed by language.  Twitter computes their trends based on geography, and so it is unlikely that there would ever be enough tweets in Irish to impact the trends in Ireland, or for Basque tweets to appear in the Spanish trends, and so on.   Our Trending Topics are listed on each language page in the right-hand column.  If you click a trending topic on IndigenousTweets.com, it will open a search page for that term on Twitter's site.

    Finally, if you notice anyone missing from the tables, just enter their Twitter username in the input box on the language page, and they will eventually be added to the table.

    Even speakers of languages like Basque and Welsh with vibrant online communities have been surprised to find just how many people there are tweeting in their language.   This is the other goal of IndigenousTweets.com: it's a message to the world that says "We are here and we're proud of our languages".   For languages with just a few users, I hope it inspires some people to start - make your voice heard!


    I've gotten help and feedback from a number of people over the past several weeks as this project has taken shape.   I'm especially grateful to Jean Came Poulard, Michael Bauer, Keola Donaghy, Boukary Konaté, Adrian Cain, Chris Sheard, and Wim Benes for providing translations of the site into Haitian Creole, Scottish Gaelic, Hawaiian, Bambara, Manx Gaelic, and Frisian.    Thanks also to Michael Schade, Edmond Kachale, and Neskie Manuel who provided much-appreciated technical advice and constructive crticism.

    IndigenousTweets.com is not affiliated in any way with Twitter, and it was created entirely as a free service to benefit indigenous language communities.   Enjoy!