For creating audio-books I use a text-to-speech engine. One problem is that the application dies on Unicode text. The documents that I encode are too long to correct manually so I want it automated. The correction isn’t as simple as removing all Unicode text though because if possible I don’t want to lose the meaning of the character when it is easily converted to ASCII.
For example here are some transliterations that ought to occur:
- ¢ → cents
- © → copyright
- ™ → trademark
- ∀ → for all
- ♥ → heart
- ∂ → derivative
I’m more concerned with not-breaking the text-to-speech engine but having a large breadth of transliterations would be nice. With that in mind I started looking for solutions and whittling them down to choosing one:
- Revision 1
- Package name
- Github URL
- Programming language
- Number of stars
- Revision 2 options. I want well supported and easy to run.
- #C: Number of committers
- C: Most recent commit: Hours, Days, Months, Years
|iki/unidecode||Python||75||Clone of. +. +.||8||Y|
|UnidecodeR||R||58||Good to know!|
The Python port looks like the most actively maintained and Python is always a good choice. The author’s discussion of his port is interesting for programmers. In theory we design system that use Unicode even though we know that they’ll have to inter-operate with ASCII-only systems. In practice it is usually an afterthought that results in well-hidden bugs and exploits. Kind of gets you wondering whether or not we would be better off only building ASCII-only systems today.
Here is how to get it set up with
virtualenv on OS X and
This code should answer 1114111 (not 65535)
import sys print sys.maxunicode
This explains common CFFI errors from systems with both
ucs4 installatins that are “mixed up”:
Here is how you know that there is a problem:
This is about getting an ImportError about
_cffi_backend.sowith a message like
Symbol not found: _PyUnicodeUCS2_AsASCIIString. This error occurs in Python 2 as soon as you mix “ucs2” and “ucs4” builds of Python. It means that you are now running a Python compiled with “ucs4”, but the extension module
_cffi_backend.sowas compiled by a different Python: one that was running “ucs2”. (If the opposite problem occurs, you get an error about
Here is the solution for doing a custom build with a custom CFFI and virtualenv though pyenv is also mentioned.
More generally, the solution that should always work is to download the sources of CFFI (instead of a prebuilt binary) and make sure that you build it with the same version of Python than the one that will use it. For example, with virtualenv:
virtualenv ~/venv cd ~/path/to/sources/of/cffi ~/venv/bin/python setup.py build --force # forcing a rebuild to make sure ~/venv/bin/python setup.py install
This will compile and install CFFI in this virtualenv, using the Python from this virtualenv.
Here is a start. It doesn’t build right now and I’m stuck. Pythonistas, what am I doing wrong here?