2014-04-29

Social media in bilingual environments: online practices of Frisian teenagers

   The following is a guest post by Lysbeth Jongbloed, researcher at the Fryske Akademy, specializing in the use of the Frisian language in social media.  We're grateful to Lysbeth for taking the time to share her research with us!



Lysbeth Jongbloed
Probably most of you know the Netherlands: from tulips, clogs, or Amsterdam. Most people in the Netherlands speak Dutch, a West Germanic language. However, in the north of the Netherlands, in the province of Fryslân, we speak a different language: Frisian. Frisian is, besides Dutch, the second officially recognised language in the Netherlands. In Fryslân, the legal status of Frisian and Dutch are equal, however, in practice, in many domains Dutch is the dominant language and also in many schools, education in Frisian is rather limited. It is estimated that Frisian is the mother tongue for around half of the Frisian population, roughly some 350,000 people. Frisian is mainly a spoken language: while 85% of the population can speak the language, only 12% indicate that they can write the language well (De Fryske Taalatlas, 2011).

Frisian Twitter conversations; map by Indigenous Tweets
Research in Fryslân

In Fryslân, the Mercator Research Centre and the Fryske Akademy carry out fundamental and applied research in the fields of the Frisian language, culture, history and society. One of the current projects studies language use on social media. The expectation is that social media offer chances for minority languages to increase their vitality.

In 2013 and early 2014 the Mercator Research Centre received financial support from the Province of Fryslân and the municipality of Leeuwarden (capital of Fryslân) to research the language use of Frisian teenagers between 14 and 18 on social media. The outcomes of this research will be discussed below. Are you also studying the use of your minority language on the internet? We are interested in setting up an international network so we can compare results and initiate European funded projects in the future. Read more about these plans at the end of this blog.
#frysk was the #1 trending topic in the Netherlands for 7 hours on April 17th
WhatsApp most popular social media platform

Twenty Frisian schools for secondary general and vocational education participated in the research. As a result, over 2,000 Frisian teenagers filled in an extensive questionnaire. Almost all Frisian teenagers (98%) use social media. 95% of the teenagers use WhatsApp (a cross-platform mobile messaging app), 86% use Facebook and 76% use Twitter. Of the three, WhatsApp is used most: 47% chose the answer 'only when I am asleep, I do not check WhatsApp'.

Oral rather than written language


In general it can be concluded that Frisian still is rather an oral than a written language. For Frisian teenagers the Dutch language is the dominant language used in writing. On average, the more formal the medium, the less often Frisian is used. For instance, for text messages and WhatsApp approximately half of the Frisian-speaking teenagers use Frisian. On Facebook and Twitter that proportion decreases to around 30%, and in emails it is 15%. In personal messages Frisian is used more than in public or group messages.

Phonetical writing

Frisian is often written phonetically. Most teenagers are aware of that but do not mind: 'People will understand what I mean anyway.' Some think it is too much work to add all diacritics, others are not sure when to use them. Furthermore, the influence of Dutch is clearly visible in the teenagers' written language, and so is the use of dialect and abbreviations that are typical of social media. It also often happens that different languages are mixed intentionally.

Teenagers from the ‘Walden’ region use Frisian most on social media
Regional differences

In the province of Fryslân, big differences have been found regarding Frisian language use. In general, Frisian is hardly used in the big cities while it is much more common to use Frisian on social media in smaller towns and in the north-east of Fryslân.

Determining factors

The language one prefers to speak is the main factor determining one's language use on social media. Other factors affecting language choice are one's attitude towards Frisian, one’s writing skills in Frisian, and the general attitude towards Frisian at one's school.

Approximately one fifth of the Frisian-speaking teenagers never uses Frisian on social media. The main reason is that they find it difficult to write Frisian, but it also has to do with their surroundings not being Frisian and their own attitude towards Frisian.

Qualitative Twitter research

Besides mapping language use of Frisian teenagers by means of a questionnaire, I also studied tweets of 50 Frisian teenagers. The 50 teenagers for the Twitter research were selected from the participants of the second ‘Fryske Twitterdei’ (Frisian Twitter day), which was organised on April 18th 2013 by the organisation ‘Praat mar Frysk’ (Do speak Frisian). During this day people were encouraged to send Frisian tweets in combination with the hashtag Frysk. The whole day #Frysk was a trending topic in the Netherlands, and almost 10,000 tweets were sent with the hashtag Frysk. Per participant, their last 50 tweets before the Twitter day, their tweets on the Twitter day, and their first 50 tweets after the Twitter day were analysed: in total over 6,000 tweets.

Share of Frisian tweets

The analysis shows that on regular days, just over 10% of the tweets were in Frisian and 65% were in Dutch. On the Frisian Twitter day 53% was in Frisian and 29% in Dutch. Although the Twitter day has a strong upwards effect on the use of Frisian in tweets, the effect is not long-lasting.

Variables of influence on language choice


Variables of influence on language choice are the type of tweet and gender. The proportion of Frisian is highest in messages addressed to a particular person. On regular days 25% of those tweets are in Frisian. On the Twitter day the proportion doubles to almost half. The use of Frisian in other type of messages rises from under 10% to over 50%. In the analysed sample, the male teenagers tweet much more in Frisian than their female counterparts.

Every Wednesday @praatmarfrysk tweets a Frisian poem. On April 16th it was a poem about the Twitter Day.
Frisian Twitter day 2014

Last week, on April 17th, the third Frisian Twitter day was organised: again the Twitter day was a big success: during the whole day it was a trending topic in the Netherlands and during seven hours it even was the number one trending topic. Over 6 million people saw the #Frysk or #frysketwitterdei on their timeline, tweets came from over 25 countries.

Further research


The Province of Fryslân has granted a new subsidy to the Mercator Research Centre of the Fryske Akademy to carry out further research into Frisian language use on social media in 2014 and 2015; in particular, the question will be addressed what dynamics in a multilingual society lead to the use or non-use of a minority language on social media. To answer this question, we are also looking for partners in other minority language regions with whom we can compare research outcomes. Consequently we would like to build up an expert network to initiate European funded projects in the future. Please contact @lysbeth2_0 if you are interested to participate. For more information about the Frisian Twitter day, you can contact @praatmarfrysk.

2014-02-27

Indigenous Tweets #IMLD14 Roundup

Last Friday February 21st was International Mother Language Day, a celebration of linguistic diversity originally created by UNESCO in 1999.  This year, together with Rising Voices and the Living Tongues Institute, we tried to encourage people to tweet in their native language using the hashtag #imld14 (#dilm14 in Spanish).    We were thrilled with the response, and you can see some of the many tweets by searching for #imld14 on Twitter, or by checking out the Storify created by Laura Morris from Rising Voices.

For fun, I looked specifically at tweets written in any of the 157 languages we're tracking on the Indigenous Tweets site. In all, there were 491 tweets containing #imld14 or #dilm14, written in 31 of the 157 languages.  Leading the way were Gàidhlig with 158 tweets, followed by 74 tweets in Aragonese, 45 in Ojibwe/Nishnaabemwin, 41 in Malagasy, and 28 in Irish/Gaeilge.

One of the primary goals of the Indigenous Tweets project is to get people to use their language every day on Twitter and other social media sites.    We hope that a few of you who did this for the first time for #imld14 will continue to tweet in your native language and encourage others in your community to do the same.

For additional inspiration, we'll close with a sampling of tweets in a few other languages.   Looking forward to an even better turnout for #imld15!!

Chichewa:
Nahuatl:

Manx Gaelic:

Lezgian:
Karuk:
Nez Perce:
North Sámi:
Māori:




2013-12-29

Mapping the Celtic Twittersphere

Over the last couple of weeks I've created maps showing the Twitter conversations taking place in the Irish, Basque, and Māori languages.  The inspiration for this came from an email conversation with Paora Mato from the University of Waikato in Aotearoa, who has co-authored (with Te Taka Keegan) an excellent analysis of the Māori Twitter community based on data from Indigenous Tweets (forthcoming).   Since people seemed to enjoy the maps I decided to do similar ones for the other Celtic languages (Welsh, Scottish Gaelic, Manx Gaelic, Cornish, and Breton) which you'll find below.

Welsh language Twitter conversations (CC-BY-SA)
These maps were all created in more-or-less the same way.  I started with the lists of people tweeting in each language from the Indigenous Tweets site – the site includes everyone tweeting in the smaller languages like Breton, Cornish, and Māori, and the top-500 most active users for Irish, Basque, Welsh, etc.

Irish language Twitter conversations (CC-BY-SA)
Next, a small percentage of Twitter users have geolocation activated for their tweets, which means that when they tweet from a mobile device, a latitude and longitude are recorded in Twitter's database along with the tweet.  These coordinates are then accessible to developers like me through the Twitter API.  For users without geolocation activated, I just collected the (self-reported) location from their Twitter profile, canonicalized the placenames, and looked up the lat/longs in a database.  For these users, I assumed that all of their tweets were sent from the resulting location.  This means, for example, that all tweets from people whose profile location is set to "Dublin", "Baile Átha Cliath", "BÁC", or variants thereof will appear to come from one particular location near the center of the city – whatever's in the database (as it happens, it's the Dublin Spire).   This isn't really a problem since I'm only interested in creating maps at the level of countries or continents.

Scottish Gaelic Twitter conversations (CC-BY-SA)
Canonicalizing the placenames takes a bit of manual labor, for a few reasons.  First, sometimes people will give their location in their profile as something like "American ex-pat living in Galway", and the geolocation services I've tried usually fail on strings like this.  Second, many people tweeting in indigenous or minority languages give their location in their native language, and for languages like Welsh, Cornish, Māori and so on, these names are often missing from geolocation databases.  Finally, there are misspellings and other noise in people's profiles that are best handled manually.

Scottish Gaelic, Great Britain and Ireland only (CC-BY-SA)
So at this point I have good coordinates for between 50-60% of the users listed on the Indigenous Tweets pages.  I then gather all tweets from the database that are in the desired language and in which one user "mentions" another.  In the case that I have coordinates for both the sender and the mentioned user, I simply draw an arc of a great circle on the map connecting the two points.  I rendered the maps using the statistical package R, which has libraries that make this sort of thing very easy (nice tutorial here, for example).

It's very common for a large number of conversations to take place between two specific points.   For example, there have been 5878 Welsh language tweets sent from Caerdydd that mention a user in Caernarfon, and 1519 Irish language tweets sent from An Cheathrú Rua that mention a user in Baile Átha Cliath.  In such cases, I've scaled the brightness of the arcs so that these frequent paths show up more prominently on the maps.

Breton language Twitter conversations (CC-BY-SA)
I'm not a linguist or sociolinguist so it's not really my place to draw conclusions about linguistic geography, language vitality, or anything else from these maps. It's best to leave this to members of the language communities themselves, who will have the best understanding of the local situation.  That said, I want to address a couple of issues people raised on Twitter after I posted the Irish, Basque and Māori maps.

Cornish language Twitter conversations (CC-BY-SA)
The most striking thing about the Basque map is how compact it is geographically, especially when compared to the Irish map where we see many conversations between Ireland, North America, continental Europe and even Brazil.  In contrast, all of the Basque conversations take place within the Basque Country, roughly speaking.   And the Welsh map, which appears here for the first time, looks much more like the Basque map than the Irish one, with just a small percentage of tweets involving a user outside of Wales, most of those to and from London.  Does this mean that somehow Irish is a more "international" language than the other two, or that the Irish-speaking diaspora is more engaged with the language?  It might, but more careful research would be needed to establish this.  My guess is that the Welsh and Basque communities look more compact in part because I'm only displaying the top-500 users in each case.  Since these languages have such vibrant communities on Twitter, the bar is set extremely high to make it into the top-500 tweeters (currently, the 500th most active tweeter in Welsh has 1073 tweets in the language, for Basque the number is 1958, but for Irish it's just 176), and I expect that users with thousands of tweets in the language are more likely to live in the traditional homeland where the language is still used on a daily basis by the local community.
Manx Gaelic Twitter conversations (CC-BY-SA)

A word or two regarding the Manx map.  Of the six Celtic languages, Manx has the smallest number of users on Twitter and probably the smallest number of speakers also.   Several users have "Isle of Man", "Ellan Vannin" (or variants thereof) as their location (and no more specific location on the island).  Because of this, I normalized all locations on the island to a single lat/long, and therefore (disappointingly) the map doesn't show what I expect is actually an interesting network of communication taking place on the island; instead it just shows the conversation pathways between the island and three users off the island.

Finally, a word about privacy.   I haven't plotted locations at a granularity finer than a city or town except in cases where users have explicitly activated geolocation for their tweets.  And even in those cases, since the maps are at a pretty large scale, it's impossible to pinpoint the exact location of any particular user.  That said, not everyone will be so scrupulous with your data, and if the idea of a stranger plotting your movements on a map creeps you out (I think it should), you should deactivate geolocation on your Twitter account (under Settings, go to "Security and Privacy", and then make sure the box next to "Add location to my tweets" is unchecked).  If you don't want anyone to know where you are at all, you can also remove your location from your Twitter profile (Settings → Profile → Location).   And if you don't want sites like Indigenous Tweets to have access to your tweets at all, the easiest solution is to make your Tweets private (Settings → Profile, and tick the box next to "Protect my tweets").




2012-10-15

Facebook in your language

It's been a long time since I posted anything here.  The Indigenous Tweets project is still going strong, and the number of languages we're tracking on Twitter continues to grow - we added the 138th and 139th languages (Inari and South Saami) to the site a couple of weeks ago.  Last week, the team at Twitter was nice enough to feature Indigenous Tweets on their "Twitter Stories" site; you can read that piece here.

Since January, I've spent a lot of time working on another project aimed at encouraging indigenous language groups to use their languages in social media.  What we're trying to do is produce translations of Facebook's interface (the menus, navigation, etc.) into as many languages as possible.

You may be aware that Facebook has a nice system in place that allows volunteers to translate the site into about 100 different languages, including a number of languages that we care about here, like Irish, Cherokee, Northern Sámi, and Aymara.  This is about the same as the number of language teams currently translating Mozilla Firefox (105) and somewhat less than the number of languages the Google search interface is available in (150). 

The trouble is, neither Facebook nor Google has added any new languages to their translation systems for quite a while.  In the case of Google, this is stated explicity in their translation FAQ: "Right now, we're unable to support more languages in GIYL".  We haven't been able to reach anyone at Facebook about this, but we've heard second-hand that they have had problems with spam translations and poor quality from some of the smaller translation teams.  Whatever the reason, there are hundreds of language groups out there actively using Facebook to communicate in their language, but who are forced to use the site in English, Spanish, etc.  This flies in the face of Facebook's stated aim to "make Facebook available in every language across the world".

To solve this problem for his own language of Secwepemctsín, the late Neskie Manuel came up with a clever solution using a technology called Greasemonkey.  His code acts as a kind of "overlay" that runs in your web browser; as you navigate pages on Facebook, they are sent across the network to you in English, but then can be translated on the fly in your browser.

At one level this is just a "hack", and even Neskie viewed it as a temporary workaround: "It would be good to be able to use the official Facebook Translations App, but Secwepemctsín isn’t listed. Until then, we can use this script."  Personally, I think it's a bigger, more important idea than that.  What it means is that any language group can undertake a translation without having to wait for Facebook's approval or permission, and the same approach works in theory for Google or other popular web sites that aren't open to translation.   I've been working on open source software translations for more than ten years, and have contributed to the Irish translations of Mozilla Firefox, LibreOffice, KDE, etc.  I've strongly advocated [PDF] for an open source approach among indigenous language groups who are just starting out on software translation, because it means that the community itself can maintain control and ownership of their work, instead of having to rely on the goodwill of a big, for-profit corporation.  The trouble we're facing now, however, is that more and more of the software we use is "software as a service": Gmail instead of Mozilla Thunderbird, Google Docs instead of LibreOffice, etc., or social media sites like Twitter and Facebook.  This trend puts control of the online "linguistic landscape" firmly back in the hands of big corporations.  Neskie's approach gives us a way to maintain a measure of control over the language we choose to use online.

The response to this project has been overwhelming.  More than 60 different language groups have started translations, and we already have more than 30 that are in a usable state.  About two-thirds of these languages are endangered according to the UNESCO Atlas of the World's Languages in Danger, and in the majority of cases, I'm aware of no previous efforts to translate software into the language.

Doing a "complete" translation is quite easy.  Depending on how much terminology you have to make up, it can take as little as a couple of hours of work. I've picked out around 200 of the most common messages that appear on Facebook to be translated.  Of course this is only a small fraction of the entire site (which would be overwhelmingly large for a small language group to undertake), but by choosing these 200 messages carefully, we're able to achieve a convincing immersive experience in the target language with a minimum of effort.

There are a few technical terms needing translation (e.g. "Mobile Uploads", "email address", "Apps", "Cookies"), some site-specific jargon ("to like/unlike", "to poke someone", "status update"), and western concepts that have been difficult to render in some indigenous languages ("Privacy", "Advertising").   A useful technique for terminology creation is to see how other languages have dealt with a given concept.  To help with this, I've asked everyone who has contributed a new Facebook translation to also provide "back translations" of some of these tricky terms into English, in the hope that some of these might be helpful to new translators.   These back translations are stored on the project wiki, and we welcome additional contributions in any language.

I should also say that you don't need to translate all 200 messages if you don't want to.  For a language that is rarely, if ever, seen on the computer, I think there's great symbolic value in even a translation of just a few key words, for example "Like", "Unlike", "Comment", and "Share".

Would you like to try translating Facebook into your language?  Leave a comment below and I can send you detailed instructions!

2011-12-23

1000 Languages on the Web

Click to see the full size image

About the Image

Since 2003 I've been gathering texts from the web written in indigenous and minority languages.  The image above is a "family tree" of the 1000 languages I've found to date, where proximity in the tree is measured by a straightforward statistical comparison of writing systems (details below).
  • When you load the full image it will be too big to fit in a browser window and you may not see anything at first you'll need to use the horizontal and vertical scrollbars to explore different parts of the tree (most browsers will let you zoom in and out also).  And because it's an SVG image, you can use your browser's search functionality (probably Ctrl+F or ⌘-F) to find different language codes, although the search behavior can be a bit weird/unpredictable.
  • Each language is colored according to its linguistic family (details here).  For example, all Indo-European languages are greenish colors, with different subfamilies (Celtic, Germanic, etc.) being slightly different shades of green.  I also tried to use similar colors for languages from the same geographical region even when there is no known genetic relationship among them, and so Arawakan, Quechuan, Tucanoan languages (all from South America) are shades of purple, while Central and North American languages are shades of blue.
  • Clicking on a language opens a new tab or window with the documentation page for the ISO 639-3 language identifier where you'll find a name for the language in English and a link to its Ethnologue page for additional information.
  • What I'm calling "languages" are really "writing systems"; you'll see, for example, separate nodes for bo (Tibetan) and bo-Latn (Tibetan written in Latin script).  In a small number of cases I track macrolanguages, regional variants (e.g. en, en-IE, en-ZA), and some dialects.  In total, there are 919 distinct ISO 639-3 codes among the 1000 writing systems represented.
I'm using these data in collaboration with language groups all around the world to develop basic resources that help people use their language online: keyboard input methods, spell checkers, online dictionaries, and so on.  This work also underlies the Indigenous Tweets and Indigenous Blogs projects, which aim to strengthen languages through social media.  You can learn more about how indigenous and minority language communities are using the web, social media, and technology to help revitalize their languages by following us on Twitter.

The Gory Details

Everything is based on an analysis of three character sequences ("3-grams") in the different languages. It turns out that computing the statistics of 3-grams in a given language provides a "fingerprint" that can be used for language identification and a number of other applications.  Specifically, imagine the huge-dimensional vector space V whose axes are labelled with all possible 3-grams of Unicode characters (dim V > 1015).  Given a collection of texts in a language, you can compute the frequencies of all 3-grams that appear in the collection, defining a (sparse) vector in V "representing" the language.  We then define the distance between two languages to be the angle between their representative vectors in V.  This can be computed by scaling the vectors to unit length and computing their dot product (which is the cosine of the angle we want).

Once we know the distance between each pair of languages, we can reconstruct a phylogenetic tree using any of a number of well-known algorithms.  The image above was created using the so-called "neighbor-joining" algorithm (which basically builds the tree in a greedy, bottom-up way). A side-effect of the algorithm is that each edge in the tree is assigned a length, but note that the edge lengths in the rendered image have nothing to do with the computed edge lengths (indeed, it's unlikely that the tree can be rendered in a distance-preserving way in two dimensions).  Another side-effect of the algorithm is that the tree is connected by definition, all languages are within a bounded distance of each other and so near the root of the tree you'll see various languages which use completely different scripts joined in a more-or-less random fashion (Khmer, Georgian, Tamil, Cherokee, etc.).  It would be easy enough to tweak the distance function or the algorithm to render languages with different scripts as separate connected components.

How many languages are out there?

Ethnologue lists 6909 living languages in the world, but how many have some presence on the web?  The answer depends greatly on what kinds of documents you include.  If one takes linguistic studies into account, the number might be as high as 4000 – the Open Language Archives Community (OLAC) brings together data from linguistic archives all over the world into a single, searchable interface.  The OLAC coverage page shows, at present, the existence of online resources for 3930 of the 6909 Ethnologue languages, with more material coming online every day.  The amazing ODIN project harvests examples of interlinear glossed text from linguistic papers, and has over 1250 languages in its database.

The 1000 languages found by my web crawler are, for the most part, what you might call "primary texts": newspapers, blog posts, Wikipedia articles, Bible translations, etc.  My best guess at present is that around 1500 languages have primary texts of this kind on the web.  If you know of online resources written in a language that's not listed on our status page, please let me know in the comments.

Here are a couple of closely-related (but ill-defined) questions: first, "How many of the 6909 languages have a writing system?" and second, since a great number of the texts we've found are Bible translations or other evangelical works, one might ask "How many languages have a writing system that's used regularly by members of the speaker community?"  I've looked around a bit for answers to these questions but I haven't found any careful studies in the literature.


Mash it up!

I put all of the data and scripts needed to generate the image in a github repository.  I'm not an expert on data visualization, so I'm hoping others will grab the data and experiment.  One idea would be to use a more sophisticated algorithm for reconstructing the tree, such as Fitch-Margoliash. In terms of the visualization itself, it would be cool to do something that connects the tree to locations on a world map where the languages are spoken. There are also some Javascript/HTML5 graph viewers that might provide a better browsing experience.  Or you might simply select the colors in different ways (perhaps colors for different typological features: for example, SVO, VSO, etc.).  Feel free to post additional ideas in the comments!

Thanks

First, I'd like to thank the hundreds of people who have contributed to the project over the years by providing training texts in many of the languages, correcting errors in the language identification, editing word lists, and helping separate different dialects/orthographies.  You'll find many of their names on the project status page. Thanks also to Michael Cysouw who first suggested generating an image of this kind (you can find his image, created in 2005, on the main project page). Finally, thanks to my colleagues at Twitter for several helpful conversations and for their interest in the Indigenous Tweets project.

2011-12-06

Language revitalization through free software: the case of Aragonese

Aragonese is one of the minority languages of Spain, spoken in the autonomous community of Aragon in the northeastern part of the country.  With an estimated 10,000 native speakers, it is in a much more precarious position than its neighbors Catalan and Basque.  Nevertheless, there is a vibrant online Aragonese community that is working hard to develop free and open source resources to support and help revitalize the language.  One notable example is the tremendous volunteer effort that has gone into developing the Aragonese Wikipedia; weighing in at 25,000+ articles and 2.5 million words, it is believed to be the largest Wikipedia of any language, per number of native speakers.  For this interview, I spoke with two leading figures in the Aragonese online community about their work on behalf of the language: Santiago Paricio, a high school teacher of Spanish in Navarra, and Juan Pablo Martínez, a university professor in the Engineering School at the University of Zaragoza.


Santi Paricio (L) and Juan Pablo Martínez (R)
KPS: Please tell us a little bit about the Aragonese language, how many speakers there are currently, whether it's taught in schools, etc.

SP/JPM: Although there are no official data, it is estimated that some 10,000 native speakers in the north of Aragon (less than 1% of the Aragonese population) plus an indeterminate number of second-language speakers speak Aragonese. The number of native speakers is dramatically decreasing mainly due to the fall of intergenerational transmission. In most areas, only older people use the language. In contrast, there is a certain interest among young and mid-age people to learn the language in areas where the language is not spoken anymore as a native language. Some of them are even raising their children in Aragonese.

But this has not always been like that. Aragonese was once spoken in almost all Aragon and was one of the administrative languages of the Kingdom of Aragon. However, it has suffered a constant decline and progressive substitution by Spanish since the 15th Century.

The language is only being taught as a voluntary subject at five primary schools in the north of Aragon. Since 2010, with the passage of the “Law on Languages of Aragon” the language has a minimal legal recognition from the local government. However, the Act, which established a Language Regulator Body (Academy) and voluntary classes in all educative levels in the regions where the language is still spoken, has hardly been developed, and the new local Administration elected in May 2011 has announced that they will reform the Act, which they opposed, rather than develop it. According to the UNESCO Atlas of Endangered Languages, Aragonese is categorized as “definitely endangered”.

You can hear the sound of Aragonese at the Archivo Audiovisual del Aragonés.

KPS: What opportunities are there to use the language online?

SP/JPM: In Aragon, access to technology is not itself an issue. However, native speakers of Aragonese are a mainly aging and rural-based population, so their access to the Internet, computers, and ICT in general is on average lower than the rest of the population. Speakers of Aragonese as a second language are, in contrast, much more active on the Internet and, being more conscious of the language, they tend to use the language more often.

There are not many sites or software translated into Aragonese.  Some examples are Mediawiki (the software to build wiki webpages like Wikipedia), some parts of Ubuntu and Firefox, and several other small programs.  There is a nonprofit association, Softaragones, in which we are also involved, promoting software localization for Aragonese.

Aragonese Wikipedia
As for resources, Wikipedia in Aragonese is probably the main one nowadays. It is a very active project (the most active Wikipedia in terms of size per number of speakers), and represents now the widest corpus in Aragonese which can be found on the Internet (with the advantage of being free content). It has also acquired the attention of Aragonese mass media, with several interviews on the public radio station and a full-page story in the main newspaper. We are currently involved in developing open-source tools for the language: spell checkers, machine translation systems, online dictionaries… We can also highlight the efforts in the field of distance language learning; for example the non-profit cultural association Nogará-Religada which launched distance courses in Aragonese in recent years, based on the Moodle platform and assisted by other technologies, such as VoIP.

However, lack of resources and translated software does not preclude the use of the language on the Internet: we can find a number of websites and blogs written in Aragonese, and even a recently-created digital newspaper. Although modest in absolute numbers, their relative prevalence is high, given the size of the Aragonese-speaking community. Social networks represent a good opportunity to use the language online, by creating online speaker communities (very important for a community that is so sparse in the “real world”), or just using the language for general communication purposes (taking advantage of the fact that intercomprehension with the majority language, Spanish, is not difficult).

KPS: Many speakers of indigenous and minority languages are reluctant to use their languages online.  What is the general attitude toward using the language online?  Are there any special obstacles that arise for Aragonese speakers? 

SP/JPM: Most native speakers wouldn’t even think about using the language online, because the language still has a stigma of being “bad speaking”, “useless language”, “only valid to speak about the rural world”.  Some don’t even feel comfortable using the language outside their family circle. This does not fully apply to the youngest generations who have received the language from their parents: they often have a better linguistic awareness, as a part of their identity, and are less reluctant to use the language online, at least when communicating with known people. However, as most of them have not received any education in Aragonese, nor have they ever written the language, they often feel insecure about it. On the contrary, speakers of Aragonese as a second language are more likely to use Aragonese online, not only as a communication tool with other Aragonese-speaking Internet users, but also as an activist decision to promote the language. We think that the main driving forces for using the language online are activism and identity.


The proposed official orthography
KPS: How is/was computing terminology developed?  Is there a "language board" or are terms developed naturally by the community?  If there are official terms, how are they communicated to the community?

SP/JPM: That also holds in the case of Aragonese. The community usually adapts most commonly used terms from Spanish or Catalan to Aragonese, but there is not always a unique solution.  For lesser-used, more specific terms, we can mention the community working on the Aragonese Wikipedia as a source for terminology.  Softaragones has also developed a “collection of computing terms” and a style guide for software localization and translation, but this is mainly useful for advanced users and translators, rather than for regular users. Due to the lack of response from the administration, the II Congress of Aragonese created in 2006 a nonofficial regulatory board, the “Academia de l’Aragonés”. Together with their proposal of an interdialectal spelling system (PDF), they published some guidelines on the adaptation of technical words, which has somewhat reduced the multiplicity of possible solutions.  In brief, development of computing terminology is needed in Aragonese, but does not preclude online use of the language.

KPS: Are there other special challenges your community faces in terms of developing technology for the language and/or communicating online?

SP/JPM: We believe the adoption of a unique spelling system would be crucial to booster the generation of new resources. The 2010 proposal of the Academia de l’Aragonés linked above has not reached full consensus, but it is the spelling system most widely used in the generation of new online content (e.g., in the Aragonese Wikipedia and in the online newspaper Arredol), as well as among most active online users (as an example of this, it is used by 25 of the 26 top tweeters listed on the Indigenous Tweets Aragonese page). As a consequence of this, the open source linguistic tools now under development are using this spelling system. Another issue is that of dialectal variation. While there is no communication problem caused by dialectal differences, it is necessary to provide them with tools as spellcheckers and/or translators (or at least take them into account, as there is not a strong standard dialect). In general, dialects are not represented enough online.
Bilingual signs on a hiking trail (CC-BY)

Of course being such a small minority, software vendors and service providers do not show interest in including localizations for Aragonese, to say nothing of developing linguistic resources. We must find the way forward for our language in open source/free software projects, which allow the reuse or adaptation of technologies and resources developed for other languages. An example of this is Apertium, a free/open source machine translation project which has just released a first version of an Aragonese-Spanish bidirectional translator (the latest version can be tested here or here). These projects also promote cooperation between developers interested in different lesser-used languages or language lovers in general. Another example is the release of an Aragonese spell checker, which already has extensions for Mozilla products and LibreOffice.

KPS:
Are young people using the language online?  Do you think social media sites like Facebook and Twitter are helping encourage language use by younger speakers?

SP/JPM: Yes, mostly young people use the language online. Until a couple of years ago, the use of the language online was mostly limited to some second-language speakers and activists.  Recently, social networks like Facebook and Twitter have opened new chances to use the language, to connect with other speakers, and are seen as a window to show the language and the community. This has indeed encouraged the use of Aragonese by younger speakers, now including native speakers, who have shifted their oral communication habits to these new modalities.  This is very good, as it puts people speaking different dialects in contact with each other, and also native speakers with second-language speakers, improving the feeling of being a community.

KPS: What is your vision for your language in ten years, both in general terms and in terms of software/online use?

Aragonese-speaking village of Ansó (CC-BY-SA)
SP/JPM: It is difficult to say.  The dream scenario would be that children in the speaking areas would be able to learn the language at school, and children in the rest of Aragon would have the opportunity to learn it. Aragonese society should also be more aware of the cultural value of their own language. With support from the Administration and Civil Society, the objective of preserving intergenerational transmission and increasing language vitality could be achieved.  In terms of online use, the aim would be that Aragonese speakers find the tools and resources to use their language online (translators, spellcheckers, speech synthesis and recognition, localized applications…), to get and create content in their language, and to use it correctly. 

In more realistic terms, we believe that the use of the language online and the availability of online/computer language resources will indeed increase in the coming years, and this will open opportunities for the language, but this by itself does not guarantee the survival of Aragonese.  The language must be transmitted to the children, and they need to learn to read and write the language at school.  Otherwise, the efforts we are undertaking in the “digital world” might be useless.  On the positive side, while decades ago it was already thought to be very close to extinction, Aragonese is still a living language in the 21st century, and we are working to keep it alive.

2011-11-11

"Murdered on its native territory": Jordan Kutzik on Yiddish

Yiddish is a Germanic language tradtionally spoken by Ashkenazi Jews in Central and Eastern Europe and in diaspora communities around the world.  Prior to World War II, it was the mother tongue of more than 10 million people, and had a thriving written tradition, with newspapers, scholarly works, and a modern literature being produced in the language.  This came to an abrupt halt with the Holocaust, which left the vast majority of Yiddish speakers dead, and saw the survivors scattered to all corners of the globe.  Although the language remains relatively strong among certain Hasidic and Orthodox Jewish communities, outside of those communities it faces many of the same obstacles as other minority languages in terms of encouraging its use among the younger generation, and guaranteeing intergenerational transmission.

Jordan Kutzik just finished his BA at Rutgers University in Jewish Studies and Spanish, focusing in particular on the Yiddish language and Spanish translation.  He is currently working at the National Yiddish Book Center in Amherst, Massachusetts as a fellow.

KPS: For readers not familiar with your language, tell us a bit about the history of Yiddish and its current status.
Jordan Kutzik

JK: The history of Yiddish and its current status is much more complicated than any other indigenous or minority language except for perhaps Romani, because the language was murdered on its native territory and exists today in different pockets of speaker communities descended from immigrants from Eastern Europe on four different continents and the language’s “strength” or “health” varies by community, country, and of course how one decides to measure it.

Yiddish, a Germanic language written in the Hebrew alphabet, was the mother-tongue of around 11 million people, 8 million of them in Eastern Europe prior to the Holocaust (Ukraine, Poland, Belarus, parts of Russia, Lithuania, Latvia, Romania, Moldova, etc,) with immigrant communities around the world.  In its Eastern European heartland it was the language of Jews of all levels of religious affiliations and the language of various schooling systems from secular schools to traditional religious academies.  Yiddish had an important literature of religious materials and original secular literature as well as translations from other languages and more than 100 daily newspapers, some of which were of a very high quality, on par with the national newspapers in other languages of the time period. The common language throughout Eastern Europe promoted a common ethnic identity among Ashkenazi Jews (those who traced their ancestry to Germany) and Yiddish was the strongest non-territorial language in the world, especially in terms of written material.  Right as the language was coming into its own in a modern sense, the Holocaust left around 6 million Jews dead in Europe, including 5.5 million Yiddish speakers.  The genocide not only killed its speakers, but more devastatingly for Yiddish it all but destroyed the civilization in which it had been the natural language.  Although by my own estimates around 1.25 million Yiddish speakers survived the war (most fleeing deep into the USSR, some surviving concentration camps, in Partisan Units, blending in with the surrounding population, joining the Russian army, etc.), the communities and institutions in which the language lived did not, and the vast majority of survivors left Eastern Europe for the Americas or British Mandate Palestine and later Israel.

In America the language died out in immigrant Jewish communities just as most immigrant languages eventually die out and in Israel the language was strongly discouraged and in some spheres actually outlawed in favor of Hebrew so it was not passed on for more than one generation for the most part there either.  After World War II, the USSR gained the Baltics and Poland and the strength of Yiddish among those few Jews who remained declined even further as the USSR enacted strong anti-Jewish national programs in Poland and the Ukraine and to a lesser extent Lithuania.  Yiddish did survive, however, among Hungarian Hasidim (who despite the name came not just from Hungary but also parts of Romania and Poland) for whom it largely remains the lingua franca whether these communities are in New York, Israel, Belgium, England, Canada or Australia.  In these communities Yiddish is the language of schools and religious academies, some media (newspapers, magazines, radio shows done through telephone hotlines, etc.) and the home.  In New York there are around 100,000 Hasidic Yiddish speakers and the population is extremely young and growing rapidly as the average family has 7 or 8 children.  There are about the same number of Yiddish speakers among Orthodox Jews in Israel as well, although the number there is tougher to gauge as language of the home is not asked as part of the census.  There are perhaps 20,000 Yiddish speaking Orthodox Jews in Antwerp, and perhaps a similar figure in both Montréal and London.  So a figure of 250,000 Hasidic Yiddish-speaking Jews is a fair guesstimate and the language is healthiest among these communities, being spoken by people of all ages.

Outside of the Hasidic world, Yiddish survived as the lingua-franca of many Holocaust survivors and many of their children speak it too.  There are still probably around 200,000 Yiddish speaking Holocaust survivors, with the majority in the USA and Israel.  But this population is very elderly and unfortunately will be gone in the coming decades.  Additionally, the language never died out entirely as a language of culture in Jewish communities in America, Latin America, Australia, France and Israel, and there are still non-Hasidic Yiddish language publications around the world.  There are, however, very few families who have kept the language alive as the language of the home and of raising children outside of the Hasidic world.  My generation has seen a bit of a revival as I know several hundred young people (age 16-30) like myself who have learned the language to fluency and I know a few dozen families who are raising their children as Yiddish speakers even though it was the mother-tongue of neither parent.  This is something I particularly hope to see more of in the coming years.  There are Yiddish courses in several dozen universities around the world, and some non-Hasidic Jewish day-schools teach Yiddish, although only a few do it so that the children leave with any real fluency. Among non-Hasidic Jewish schools Yiddish is strongest today in Australia.

In Lithuania with Fania Brantsovsky
As far as official status; Yiddish has official status in the Jewish autonomous region of Russia known as Birobizhan (near Korea!), but there are very few Jews there and few of them speak Yiddish.  Many non-Jews there learn Yiddish in the schools, however, some extremely well, and there are even government signs on courthouses and such in Yiddish, the only place in the world with actual Yiddish signage on public buildings.  Yiddish has token recognition in Israel, along with Ladino, the language of Jews who left Spain after the expulsion of 1492, but for all intents and purposes the Israeli government doesn’t do much to support Yiddish.  Yiddish is also an official minority language of Sweden, Holland, Poland, Romania, and the Ukraine under the European Charter for Regional or Minority Languages but not much is done on its behalf by these governments.

KPS: How have you been personally involved with language revitalization and activism on behalf of Yiddish?

JK: I have been involved with Yiddish language revitalization/activism for the past four years in various capacities.  I am a board member of Yugntruf Youth for Yiddish, an organization which promotes Yiddish among young people around the world and most especially in the NYC area.  Almost all of our events are run exclusively in Yiddish, most prominently our “Yiddish Week” which attracts around 150 people from around the world.  I am particularly active with Yugntruf’s facebook and twitter presence, as well as finding young Yiddish speakers in unexpected places around the world through the internet.  I also run a Yiddish-themed Youtube channel with lots of films of tours in Yiddish with English subtitles with Fania Brantsovsky, the librarian of the Vilnius Yiddish Institute and a Holocaust survivor and former partisan.  I didn’t know how to make/edit films when I made the channel so most of the films aren’t of the highest quality but there is a lot of interesting and important stuff there about Yiddish, the Holocaust, Jewish culture, etc.  Now that I’ve learned how to shoot/edit film properly I will have higher quality films in the future.  I also work as both a freelance (paid) translator as well as a volunteer translator for people using Yiddish language source materials for research involving the Holocaust for creative writing projects, historical research etc.  I copyedit an online web-journal connected with the Yiddish Farm project and have a blog in Yiddish that desperately needs to be updated.  I also tweet in Yiddish on my personal Twitter feed and run a Twitter feed dedicated to publicizing Yiddish classes and immersion opportunities (@yiddishclasses). 

KPS: What opportunities are there to use the language online?  Are there websites translated into your language?  What about software and other resources like web browsers, office software, spell checkers?

JK: Most Yiddish online now is computer generated as Google translate is available in Yiddish.  It is quite poor, actually, because if you translate a text with a word in plural form it won’t actually translate it but rather transliterate it into the Hebrew alphabet.  But when you search for a Yiddish word now most of the websites that come up are Google translations of other sites that are computer generated, which makes it more difficult to find websites that were actually written in Yiddish.  Among non-Google translated websites in Yiddish there are some Yiddish language publications, some Yiddish organizations, some Hasidic message-boards, a few Yiddish bands and so forth with Yiddish websites.  Almost all of these sites are also in a national language like English, Hebrew, French or Polish and usually the Yiddish site itself is far less extensive than the versions in other languages. It is particularly strange and frustrating to me that none of the websites for Holocaust survivors run in Yiddish.  There is also a Yiddish Wikipedia with some 7,000 articles (largely written by two very dedicated men), a Yiddish version of Google search, and some Jewish communal organizations, especially in Eastern Europe, have summary pages in Yiddish.  There is also an excellent online dictionary created by Refoyl Finkel.

KPS: Many speakers of indigenous and minority languages are reluctant to use their languages online, for various reasons.  How do speakers of your language feel about using the language online?

JK: With the internet and Yiddish there are three distinct communities; Hasidic, Yiddishist and heritage.  Hasidic Jews are, generally speaking, not supposed to be on the internet according to the rules of their own communities or are only supposed to use the internet for business in which case they will probably be doing so in a national language.  Many are, however, and there is a lot of informal Yiddish language internet use among them on message boards, twitter, facebook etc.  Most Yiddish-speaking Orthodox Jews on the internet, however, use English, Hebrew, French or Dutch as these languages are more widely understood so Yiddish usage is usually restricted to intra-community affairs, especially when they want to keep non-Hasidic Jews out.

A few Yiddishists like myself have set up Yiddish blogs, twitters, facebook pages and so forth in an effort to make the language more visible.  We also have Yiddish language Google groups and so forth.  Often times we use Yiddish as a matter of principle online even though we could be communicating in another language.

Trilingual sign (English/Spanish/Yiddish) in Brooklyn, NY
Some heritage Yiddish speakers, often the children of Holocaust survivors, will use Yiddish if they find that they don’t have another language in common with another person.  This sometimes overlaps with the Yiddishist community as well.  For instance I’ve written people at Jewish communal organizations in France and Brazil about things that had nothing to do with Yiddish just to get a response that they didn’t speak English and asking if I spoke Hebrew or Yiddish!  Far more people, and probably far more French Jews for that matter, speak English than Yiddish, but in some cases my knowledge of Yiddish proved to make communication possible where it wouldn’t have been otherwise.  So there is some non-ideologically based internet Yiddish use going on too.  I never run into that type of thing when I email a Jew in say, England or Mexico because I speak/write English and Spanish but with Brazil and France it happens occasionally. So in that sense the internet has actually gotten people to use the language more often than they would have otherwise because people are meeting online who would not meet otherwise and would otherwise have no practical use for the language.

Actually using Yiddish, however, poses some technical challenges.  Yiddish uses a modified form of the Hebrew alphabet and makes use of some vowel markings and diacritical markings that are not used in Hebrew.  Many people don’t know how to use the Hebrew keyboard or the Yiddish keyboard programs that have been developed and most people who can write Hebrew can’t write the special characters used for Yiddish with their Hebrew word-processing programs.  Furthermore, many online programs have problems displaying right to left languages like Yiddish and have particular difficulties displaying Yiddish so things like periods, commas, and exclamation points will end up on the wrong side of a line.  On Twitter the vowel markings get counted as an extra character and to make matters worse they often do not display correctly!!! A friend of mine who is very good with computers tried to make a “twitter friendly” Yiddish program with pre-combined characters but twitter still split the characters up.  This makes it much easier to leave out the vowel markings and diacritical marks on Twitter but some sticklers would rather tweet shorter messages or not tweet in Yiddish at all than tweet without using the proper Yiddish spelling.  Most Hasidic Jews, as well as myself sometimes, forgo the vowel markings and diacritical markings on the internet and especially on Twitter because it really can be a headache.  I use a transliteration machine to type Yiddish so I can’t write Yiddish in a chat program like Facebook message so I’ll transliterate the language into the Latin alphabet.  I do the same thing with text messages in Yiddish.

A bunch of us tried to organize a massive effort to translate Facebook into Yiddish since they were using crowd-source translations but it just didn’t take off.  There is a Yiddish translation for Blackberry and a few smartphones have been made for Hasidic Jews in Israel in Yiddish.

KPS: I mentioned above that many indigenous languages lack computing terminology.  Is this an issue for your language?  How is/was terminology developed?

JK: As far as vocabulary, most Yiddish speakers learned to use a computer in another language but since Yiddish is sometimes the only common language among people using it online there has been a slight tendency toward the creation of neologisms.  Most of these are unknown among Hasidic Yiddish speakers and are only used by Yiddishists but a dozen or so including some of the most essential like blitspost (“email” as a category) blitsbriv (an individual email), vebzaytl (website), shleptop (laptop) have caught on in both the Hasidic and Yiddishist world.  Blits means lightening in Yiddish, so the words for email mean “lightening mail” or “lightening letter.”  Veb means “web” and zaytl means “page” so that renders “webpage” but it also echoes the English “website” as the pronunciation is similar.  Older Yiddish words like the words for screen, document, keyboard, erase, save, etc have been naturally given newer meanings but you’ll also see English or Hebrew equivalents being used and transliterated to Yiddish spellings too.  For basic everyday computer usage it’s never a problem and there are basic computer classes in Yiddish for Yiddish speaking Hasidic Jews taught over the internet but I doubt anyone is doing complicated programming in Yiddish on a regular basis, with the exception of some database work cataloging literature which was done at an Israeli University.

KPS: Are there other special challenges your community faces in terms of developing technology for the language and/or communicating online?

JK: There is an academic standard written Yiddish spelling but most speakers don’t use it.  This really doesn’t cause any problems in computer usage or reading the language because everyone except students just beginning to read/write is familiar with variations in spelling.  This does cause problems, however, when someone wants to make searchable databases.

KPS: What is your vision for your language in ten years, both in general terms and in terms of software/online use?

JK: Yiddish speakers need to organize to use resources and funding available from governments, especially in Europe, to teach Yiddish to more people, especially children.  I am particularly interested in the language-nest model and want to assemble a team of people down the road who could start an international non-profit to run a steering committee to run language nests in Jewish communities where Yiddish was spoken before World War II and where it enjoys protection under the European Charter for Regional or Minority Languages.  There is also enormous potential for broadcast media in Yiddish done through the internet.  We have radio shows which double as podcasts and Youtube channels but we could really use something like a weekly TV show done as a podcast.  There is no local market that would justify the expense of a Yiddish TV show on TV as the Orthodox don’t use TV’s but now with the internet and archiving it could be done. And I think that any use of media; whether websites like Twitter, radio broadcasts, podcasts and more traditional media like newspapers and magazines help to promote the language.

As far as online use, I’d like to see more Jewish organizations and governments, especially those that serve Yiddish speakers such as Holocaust survivors or Hasidic communities, have websites in Yiddish.  It’s absurd that the government of Sweden and the New York Health Department publish information online in Yiddish but the government of Israel does not.  German, French and American websites written for Holocaust survivors and their children should also have information available in Yiddish.  I’d also like to see a usable Facebook interface in Yiddish.  Obviously Facebook in Yiddish wouldn’t be practically useful like say a Health Department bulletin written for Hasidic Jews but it would be a really cool thing to be able to show to young people and say “hey, you can even use Facebook in Yiddish!”