character set guessing library
William Pitcock b44a240c57 Merge pull request #11 from jlindgren90/master 3 years ago
m4 Update buildsys to fix linking issues under OS X 3 years ago
scripts Remove broken links to jira.atheme.org 4 years ago
src Add C++ guards to libguess.h. 3 years ago
.gitignore Update .gitignore. 3 years ago
COPYING Import libguess (based on audacious sources). 8 years ago
GIT-Access libguess is now on git. 7 years ago
Makefile Import libguess (based on audacious sources). 8 years ago
README Remove broken links to jira.atheme.org 4 years ago
autogen.sh Import libguess (based on audacious sources). 8 years ago
buildsys.mk.in Update buildsys to fix linking issues under OS X 3 years ago
config.guess Import libguess (based on audacious sources). 8 years ago
config.sub Import libguess (based on audacious sources). 8 years ago
configure.ac Fix build 3 years ago
extra.mk.in Fix build 3 years ago
install-sh Import libguess (based on audacious sources). 8 years ago
libguess.pc.in Import libguess (based on audacious sources). 8 years ago

README

libguess - a high-speed character set detection library
-------------------------------------------------------

libguess is free, but copyrighted software. See COPYING for details.

Using libguess in your programs:
--------------------------------

libguess has two functions:

- libguess_determine_encoding(const char *inbuf, int length, const char *region);

This detects a character set. Returns an appropriate charset name
that can be passed to iconv_open(). Region is the name of the language
or region that the data is related to, e.g. 'Baltic' for the Baltic states,
or 'Japanese' for Japan.

- libguess_validate_utf8(const char *inbuf, int length);

This employs libguess's DFA-based character set validation rules to ensure
that a string is pure UTF-8. GLib's UTF-8 validation functions are broken,
for example.

Just include libguess.h and link to libguess to get these functions in your
program. For your convenience, a pkg-config file is also supplied.

How does libguess work:
-----------------------

libguess employs discrete-finite automata to deduce the character set of the
input buffer. The advantage of this is that all character sets can be checked
in parallel, and quickly. Right now, libguess passes a byte to each DFA on the
same pass, meaning that the winning character set can be deduced as efficiently
as possible.

libguess is fully reentrant, using only local stack memory for DFA operations.

I found a bug:
--------------

Please report your bug using GitHub issues.