Cantonese self-learning guide

Learning Cantonese is hard

I grew up speaking Cantonese in country Australia in the 1980s. But, sadly I didn't learn to read or write it. Looking back over several decades of learning other languages, I see that it would have been unlikely that I would have successfully learnt Cantonese as self-learning teaching guides back then were grossly inefficient.

From an English speaker's point of view, Cantonese is a particular hard language to learn:

  • Cantonese uses tones to distinguish words - English uses tones for emphasis
  • Cantonese uses a steady staccato rhythm - to allow for easier comprehension for single tone words. English uses rolling rhythms such as sentence stress and accented syllables to distinguish polysyllabic words
  • Cantonese uses characters, or logograms, which have no systematic guide to phonetics. English uses a phonetic alphabet
  • Cantonese characters are really hard to type on a qwerty keyboard
  • Cantonese has an enormous number of homophones - words that have the same sound but different characters.

Last year, I started to learn Cantonese using modern techniques and found that things have improved immeasurably. So here I've put together some useful things to self-learn Cantonese.

Standard Cantonese phonetics - Jyut Ping

The biggest difficulty in learning Cantonese characters is matching sounds to words. Cantonese (or traditional Chinese characters) provide little useful pronunciation information for learners. Occasional Chinese characters give partial information about pronunciation, but these are of limited value because the association is based on similarity to other existing characters.

It is thus one of the essential ironies of learning to read-and-write Chinese characters that you must learn yet another writing system - a system that phoneticizes Cantonese. There used to be no generally accepted Cantonese system of phoneticisation - over my life time, my Cantonese name has been spelt differently on my birth certificate, driver's license and bank account.

But I am now happy to report that Jyut Ping is the standard phonetic system for Cantonese. It's recommended by the Linguistic Society of Hong Kong. It's pretty great. Instead of awkward diacriticals or random indicator letters to indicate tones, Jyut Ping uses numbers to represent the 6 major tones, as well as the 3 less common ones. Numbers are a little bit trickier to learn at first, but they are much easier to use and type on keyboards.

Crowd-sourced online dictionary: Cantodict

Traditional Chinese dictionaries are not fun to use. There's really no systematic way to breakdown Chinese characters. Character order is a hodge-podge of arbitrary rules - stroke count, stroke order etc. Some dictionaries even order characters in terms of meaning or shape. If you don't have a working understanding of how to write Chinese characters and how they sound - then you won't really be able to use a traditional dictionary.

That's all changed with the internet and smartphones. First, there now exists a wonderful online Cantonese dictionary Cantodict. It is crowd-sourced as Cantonese speakers from all around the world are constantly adding to it. It is probably the most comprehensive English-Cantonese dictionary that has ever existed. It contains a treasure trove of information that would be hard to include in a compact dictionary. Looking up a word is as easy as typing it into the website.

Still that still leaves a seemingly intractable problem - how do you type Chinese characters into the website? It's easy if you cut-and-paste characters from another website, but what if you want to look up something from a book or a newspaper or a sign on the street?

Smartphones help you get around this problem. It turns out that touch interfaces on smartphones allow you to write characters directly into phone, and really good machine-learning or artificial intelligence can figure out the character from your shaky of hand-scribble.

Here's what to do:

  • enable traditional Chinese writing on your iphone. (It's a bit trickier on android).
  • open Cantodict on your phone browser (Safari on iphone)
  • tap onto the search bar
  • choose Chinese character input
  • write your Chinese character
  • choose the correct one predicted by your phone
  • click search

You may find that the layout of the Cantodict website is a bit hard to use. I had that problem, so I created a website called Cantolite that provides a better mobile experience for Cantodict on an iPhone.

Smart rote learning: spaced-repetition-software

One thing I've found out in learning languages is that at some point, you have to rote-learn a lot of things. There is no easy way to get around it. In the old days, you'd grind through a text-book, or a word-builder, or sit through listening classes. Sadly, these methods are not very efficient. The reason is that rote learning involves both recognition and recall. Learning from a book is great for recognition, but terrible for recall. Another reason is that the order of words in old textbooks are not based on everyday usage.

There is a smarter way. Research has now shown that flash cards is the best way to rote learn. Flash cards ensure you practice recall as much as recognition. As well, , research has shown that there are optimal amounts of time to use flash cards efficiently. This is called spaced repetition.

You can now obtain comprehensive spaced-repetition software that can facilitate the rote learning of almost anything - words, concepts, sounds and images. In fact everything you need to learn Cantonese can be packaged as a flash card system. The one system I've used is Anki, which has a wonderful ecosystem of desktop apps, mobile apps, cloud-based connectivity, and a huge resource of freely-available pre-packaged flash cards. The $25 I spent purchasing the Anki mobile app was the best $25 I ever spent for learning languages.

One example where spaced-repetition advantage is essential is learning Cantonese tones. From my experience with other languages, nothing beats tone practice with Anki tone flash cards. Books can't teach you how to listen, and no real human will sit pronounce Cantonese tones to you until you get it. So if you want to brush up on Cantonese tones, use these Anki Cantonese flash cards.

Order of learning - Frequency and Chunks

There is so much data out there that if you know a bit of coding, you can quite easily make Anki flash card sets. However, for optimal learning, you have to construct your Anki card sets very carefully.

There are several reasons. First, if you learn things in the right order, things you learn now will make things you learn later, easier. Second, you have to conserve your attention span. Realistically, you only have so much effort and attention. If you don't feel you are making any progress then you'll be discouraged. It's really important that you are learning things that are useful in the world at large.

To build useful Anki flash-card decks, the most important thing to do is to build the decks around frequency usage lists. These used to be laborious to generate, made by dedicated scholars who would carefully count words in newspapers and books. But now with the explosion of data on the internet, it's now possible for almost any decent programmer to generate realistic frequency lists of radicals, characters, words and phrases. I've found that learning words in the order of a frequency list is the only order worth learning. You will have more chance of seeing those words in the world around you, on TV, on the internet, or in newspapers. This is AWESOME!

Another important thing I've found is that you have to trim your card sets to digestible chunks. Sure you can download an epic 15,000 character set, and grind through it. But it will take you years before you get through the whole set. As well, as there is a random element to the spaced-recognition algorithm, your experience of encountering the high-frequency useful words is actually quite low.

What I've found is that breaking up Chinese character sets into chunks of about 300 characters, taken from the most frequent characters, has been the most optimal way of ordering my learning. It is better to learn the most frequent 300 characters really well, before moving on to the next chunk. One reason to do this is psychological - you need a goal! Goals are really important to maintain a sense of progress. With a massive set of cards, there are no defined goal, you just grind on to forever. Another good reason is that breaking it up into 300 card decks means you can easily go back to earlier decks for revision.

Another thing you might consider is your daily attention span. You must calibrate the ratio of new and reviewed cards per day so that you can finish within a reasonable amount of time - for me about 20 minutes. This should be done by trial-and-error. If it takes any longer than 20 minutes, dial down the number of new cards. If it becomes easier, dial up more cards. I can assure you thought that it will get easier and easier. Just make sure you don't kill yourself with a setting that is too high.

It also helps a lot to start learning a list of radicals. Now radicals turns out to be a loose concept in Chinese. It's the idea that most characters are compound characters, which are made up of simpler characters - radicals. There really is not finite set of clearly identifiable radicals that make up all of Chinese characters.

There is a tradtitional set of 214 radicals, called the Kangxi radicals, which were collated in 1615. I tried to learning these, but a lot of these radicals are no longer used very much, and some of the supposed radicals are actually compound characters in themselves! Instead I've found that learning a modernized set of 100 most frequent radicals was better to get started on. These were actually useful, and learning radicals is hard. So a set of 100 was nice digestible chunk.

So to sum up here's my learning program:

  • Cantonese tones - anki set

  • 100 most frequent radicals - anki set

  • 300 most frequent Chinese characters - anki set

  • 300-600 most frequent Chinese characters

  • 600-1000 most frequent Chinese characters