2011-03-24

15 more languages

    Thanks to the expert help of Michael Bauer (akerbeltz), Indigenous Tweets now supports 54 languages!   Michael combined his broad knowledge of indigenous and minority languages with many hours of searching on Twitter (even burning through his daily quota of searches on the web site), and was able to find users tweeting in the following languages: Kalaallisut/Greenlandic, Tahitian, Bislama, Sardinian, Corsican, Aymara, ᏣᎳᎩ/Cherokee, Kabyle, Rumantsch, Tongan, Eʋegbe/Ewe, Míkmaq/Micmac, Walon, Asturian, and Yucatán Maya.  I used his suggestions as starting points for the crawler and have turned up additional users in many cases.

    I welcome contributions of this kind in the comments.   Just give the language name (in English and in the language itself if you know it), the ISO 639-3 code, and a list of Twitter usernames and I will add them to the site.

    Go raibh míle maith agat a Mhíchíl!

15 comments:

  1. 'S e do bheatha a charaid! It's a really great tool. If you think so too, why not let the top tweeters in other languages know they're top of the list in IT, so to spread the word and Kevin's workload?

    ReplyDelete
  2. Adiu !

    oci - Occitan (same name in english and occitan)
    some usernames :
    @caillonm
    @oifarri
    @portadoc
    @partitoccitan
    @JordidOlmieres
    @OccitanParis
    @oldtown_93

    I put the translation in another comment.

    ReplyDelete
  3. @Maime: Occitan has been on the list of languages to add - I'll work on it tonight. Interestingly, I spent quite a bit of time training my web crawler to distinguish the different dialects/sublanguages of Occitan, so you'll see separate entries for each one here:
    http://borel.slu.edu/crubadan/stadas.html.

    This would be too hard to do for tweets unfortunately. It also looks like Ethnologue has removed the separate language codes prv, lnc, gsc, etc. - is that true? Ethnologue page

    ReplyDelete
  4. Ny wra Kernewek nowydha yn fenowgh.
    (Cornish doesn't update regularly)

    ReplyDelete
  5. Ellery, we're still having some growing pains with the site. The plan is still to have each language page update nightly, but the limits imposed on us by Twitter are making it hard to do this until we've "caught up" with all the users in the big languages like Welsh, Basque, etc. Hope you'll be patient until that's straightened out! Go raibh maith agat!

    ReplyDelete
  6. Pur dha! An gwiasva yu bryntin ha pur a bris. Meur ras.

    (Very good! The website is great and very valuable. Thank you.)

    ReplyDelete
  7. Perhaps Ethnologue wants to show that they are not separated languages but part of occitan one. I found this explanation on the website :

    "This listing of dialect names does not represent the results of rigorous dialectological investigations. As with the alternate names, we list the names of dialects which may have been mentioned in published or other sources. Some of these names are village or regional names and may not actually represent significant linguistic variants. In a few cases, the ISO 639-3 standard has assigned individual language identification codes to varieties which we, on the advice of our contributors and consultants, have included in our list of dialects. In such cases, we depart from the ISO 639-3 standard and do not list these varieties separately as individual languages."

    If you can read occitan provençal, here's an article about occitan dialects classification.

    http://www.revistadoc.org/file/Linguistica-occitana-7-Sumien

    ReplyDelete
  8. Northern Saami (or Northern Sámi), Davvisámegiella, sme:

    @GusmonVille
    @IngaMS @AMGGraven (some Sámi, some Norwegian)
    @odasfeeda @NRKSapmi (news, sometimes Norwegian)
    @bibbalsatni (bible stuff)

    (from: https://twitter.com/#!/gaski/sami/members)

    ReplyDelete
  9. oh sme is there! I just missed it the first time around... (note: the page should probably say "Davvisámegiella", not just "Sámegiella", there are lots…)

    ReplyDelete
  10. Hey unhammer, thanks for the tweet!

    Linda Wiechetek and I were looking at trying to classify web documents into different Sámi varieties but we haven't really progressed on that in the last year. So I decided that trying to separate tweets statistically would be too hard for the time being, therefore I intentionally just used "Sámegiella". If tweeters in the other varieties show up, they will get placed on this same page.

    If you think this is a terrible idea, don't be shy! Really I'm doing the same with Ojibwe varieties and Occitan.

    ReplyDelete
  11. You can add de #Purhepechas user to the P'urhépecha language. We are from Michoacán, state of México. Every one is welcome to www.Purhepecha.com

    Everything on you page is great!

    Thank you!

    :)

    ReplyDelete
  12. Thanks for the suggestion and the link to Purhepecha.com! I added @Purhepechas to the page: http://indigenoustweets.com/tsz/. You can more users through the input box on that page ("Anyone missing?").

    Would you like to translate the page into P'urhépecha? There are just 13 messages to translate, here:
    http://indigenoustweets.blogspot.com/2011/03/new-languages.html.

    ReplyDelete
  13. Can you please add Telugu also? Here are the details for Telugu:

    Language name: Telugu
    Language name in Telugu itself: తెలుగు
    ISO Code: tel

    Twitter user names:
    @tveeven
    @tuxnani
    @sirishtummala
    @padaanveshi
    @etelugu
    @sridharcera
    @hyderabadbook
    @koodalidotorg

    I'll provide the translations in the other post.

    ReplyDelete
  14. Thanks for sending the Telugu usernames! Twitter limits the number of queries I can make to their API so I'm not adding any more "big" languages to the site for the time being. I hope to have expanded access to the API soon, and will add Telugu, Catalan, Galician, Esperanto then.

    ReplyDelete