What is Reddit's opinion of HànBǎoBāo - Chinese Learning Companion?
From 3.5 billion Reddit comments

HànBǎoBāo - Chinese Learning Companion

Free

View on Play Store

More education apps

Popularity Score: 5

This app was mentioned in 7 comments, with an average of 2.00 upvotes

Best Comments

There's a free app called Hanbaobao which could work for you.

Otherwise, both Pleco and Hanping (Popup) have paid add-ons that work pretty well.

For copying and pasting texts or reading digital books I think the free version of 读书 app can work as well.

CC-CEDICT is one obvious candidate, but there are other open source dictionaries and segmented corpuses which you can find online which may help you. I'm interested to hear more about your idea, if you're open to discussing it.

I released an app which I made for personal use, called HanBaoBao. It's a popup dictionary for Android (check it here: https://play.google.com/store/apps/details?id=com.tallogre.hanbaobao)

While developing that, I drew from many resources. What follows is a list of resources I made to help others, originally posted here

Dictionary Data:

CC-CEDICT (https://cc-cedict.org/wiki/) - this is the main dictionary used by most free dictionary apps, it's very good.

Adso (https://github.com/wtanaka/adso) - another free dictionary (check the license). I believe it's primarily intended for machine translation and not human consumption. Particularly good for Part-of-speech (PoS) tagging information.

Nan Tien Institute (NTI) Buddhist Dictionary (https://github.com/alexamies/buddhist-dictionary). Based on CC-CEDICT, but adds many PoS tags, definitions, topics, & categories. For example:

19897 美式咖啡 \N Měishì kāfēi cafe Americano set phrase 饮料 Beverage 饮食 Food and Drink \N \N \N \N \N 19897

Unihan (http://www.unicode.org/charts/unihan.html & http://www.unicode.org/reports/tr38/) - The Unihan database produced by The Unicode Consortium provides brief English definitions (note that it contains only character data, no words made of more than one character) but is more commonly used as a character reference and contains information such as stroke counts, simplified <-> traditional mappings, pinyin, and dictionary cross-references.

StarDict Dictionaries (http://download.huzheng.org/zh_CN/) - Even though the site claims that these dictionaries are GPL, I doubt it. Be wary of these.

Lingoes Dictionaries (http://www.lingoes.net/en/dictionary/index.html) - I cannot vouch for the license for any of these dictionaries.

Wiktionary (https://dumps.wikimedia.org/zhwiki/latest/ for dumps) - Wiktionary is only semi-structured data and therefore would require some processing to make it useable as a translation dictionary.

Linguistic Data Consortium (LDC) Chinese-English Translation Lexicon (https://catalog.ldc.upenn.edu/LDC2002L27) - I don't believe that this dictionary is freely useable, but it's worth noting its existence.

A List of Chinese Names (http://technology.chtsai.org/namelist/) - This list of over 700K unique chinese names was compiled from the Joint College Entrance Examination (JCEE) in Taiwan. I'm not certain how representative the names are of the greater Chinese population, but it may be useful information regardless.

CC-Canto (cantonese.org)

Linguistic Data Consortium (LDC) Chinese-to-English & English-to-Chinese Wordlists (https://github.com/ReubenBond/LdcChineseWordlists originally from https://web.archive.org/web/20130107081319/http://projects.ldc.upenn.edu/Chinese/LDC_ch.htm)

Sentence Examples:

Tatoeba (https://tatoeba.org/eng/downloads) - I haven't actually put the Tatoeba sentences to good use yet. To be honest there are quite a few which would need filtering & touching up. Some sentences are just strange, some are quite vulgar, some seem to be extracts from books, but most are earnest & good.

Jukuu (http://www.jukuu.com/help/hezuo.htm) - Has a large data set, but it only accessible as a Web service as far as I'm aware. They seem to be open to collaborative partnerships, however.

Audio:

Project Schtooka (http://shtooka.net/index.php) - Online collection of pronunciations for many thousands of words in multiple languages, including over 9000 Chinese words.

Forvo (http://forvo.com/languages/zh/) - Forvo is paid

Speak Good Chinese (http://www.speakgoodchinese.org/wordlists.html) - Farily small data set of pronounciations for individual syllables.

HSK & Other Word Lists:

Popup Chinese (http://www.popupchinese.com/hsk/test)

hskhsk.com (http://www.hskhsk.com/word-lists.html)

Wiktionary (https://en.wiktionary.org/wiki/Appendix:HSK_list_of_Mandarin_words)

TOCFL Word list (http://www.sc-top.or...sh/download.php)

Word/Character Frequency & Corpus:

Word & Character frequency data is useful for performing text segmentation (中文分词), like in HanBaoBao. Text segmentation will never be 100% accurate, particularly when performed on a mobile device. Therefore you will most likely want to include some option to show users the alternatives. In HanBaoBao users can tap a word multiple times to split it or join it with its neighbors (but only if there's a dictionary definition for that word). The way this works internally is by 'banning' the span of characters which you tap. Once all possibilities are banned, I remove all the bans and the cycle repeats. I use a weighted directed acyclic graph of the valid segmentation paths through the sentence and determine the most probable sentence based on that graph (removing the 'banned' spans). In order to speed things up (it's a slow process), I pre-split the input on punctuation and process each split separately. This could be optimized more, but it's within the acceptable performance bounds for now.

Frequency data also helps sorting definitions so that the most relevant definitions come first. The well established dictionary apps almost certainly do a better job in the relevance department and I haven't put much work into that yet.

Worth noting that much of this data cannot be trusted to be accurate, since often text segmentation software is used to segment each corpus, so there's potential for a positive feedback loop.

Open Subs 2016 data set (http://opus.lingfil.uu.se/OpenSubtitles2016.php) - A huge corpus of auto-segmented subtitles (~8Gbs uncompressed xml)

Leiden Weibo Corpus (http://lwc.daanvanesch.nl/openaccess.php)

Jun Da (http://lingua.mtsu.edu/chinese-computing/)

Jieba Analysis (https://github.com/huaban/jieba-analysis) - I'm not sure where their data comes from.

Lancaster Corpus of Mandarin Chinese (http://www.lancaster.ac.uk/fass/projects/corpus/LCMC/)

Chinese WordNet (http://lope.linguistics.ntu.edu.tw/cwn2/)

SIGHAN Second International Chinese Word Segmentation Bakeoff (http://www.sighan.org/bakeoff2005/) - Contains hand-segmented/verified texts (thanks to Imron)

SUBTLEX-CH (http://www.ugent.be/...ents/subtlexch/)

Character Composition/Data:

Make Me a Hanzi (https://skishore.github.io/makemeahanzi/) - very cool stroke animation tool & data.

Wikipedia (https://commons.wikimedia.org/wiki/Commons:Chinese_characters_decomposition)

CJKLib (https://github.com/cburgmer/cjklib)

Unihan (see above) - Contains some character composition information, such as stroke counts

CJKDecomp (http://cjkdecomp.codeplex.com/)

Miscellaneous

Remembering Simplified Hanzi (https://github.com/rouseabout/heisig)

Best of luck! You can reach out to me on Twitter (@ReubenBond) if you want to discuss the app or maybe I can help you with some pre-processed data.

It's available on the Play store now :D https://play.google.com/store/apps/details?id=com.tallogre.hanbaobao

It's live on the app store with many improvements :D https://play.google.com/store/apps/details?id=com.tallogre.hanbaobao

HànBǎoBāo does this for Android: https://play.google.com/store/apps/details?id=com.tallogre.hanbaobao .

Hanbaobao has the same functionality and it is completely free:

https://play.google.com/store/apps/details?id=com.tallogre.hanbaobao

HànBǎoBāo is pretty handy, if you are comfortable reading on an android tablet.

What is Reddit's opinion of HànBǎoBāo - Chinese Learning Companion? From 3.5 billion Reddit comments