whoami7 - Manager
:
/
home
/
creaupfw
/
public_html
/
wp-includes
/
assets
/
Upload File:
files >> //home/creaupfw/public_html/wp-includes/assets/Encode.zip
PK ���Z��p� � PerlIO.podnu �[��� =head1 NAME Encode::PerlIO -- a detailed document on Encode and PerlIO =head1 Overview It is very common to want to do encoding transformations when reading or writing files, network connections, pipes etc. If Perl is configured to use the new 'perlio' IO system then C<Encode> provides a "layer" (see L<PerlIO>) which can transform data as it is read or written. Here is how the blind poet would modernise the encoding: use Encode; open(my $iliad,'<:encoding(iso-8859-7)','iliad.greek'); open(my $utf8,'>:utf8','iliad.utf8'); my @epic = <$iliad>; print $utf8 @epic; close($utf8); close($illiad); In addition, the new IO system can also be configured to read/write UTF-8 encoded characters (as noted above, this is efficient): open(my $fh,'>:utf8','anything'); print $fh "Any \x{0021} string \N{SMILEY FACE}\n"; Either of the above forms of "layer" specifications can be made the default for a lexical scope with the C<use open ...> pragma. See L<open>. Once a handle is open, its layers can be altered using C<binmode>. Without any such configuration, or if Perl itself is built using the system's own IO, then write operations assume that the file handle accepts only I<bytes> and will C<die> if a character larger than 255 is written to the handle. When reading, each octet from the handle becomes a byte-in-a-character. Note that this default is the same behaviour as bytes-only languages (including Perl before v5.6) would have, and is sufficient to handle native 8-bit encodings e.g. iso-8859-1, EBCDIC etc. and any legacy mechanisms for handling other encodings and binary data. In other cases, it is the program's responsibility to transform characters into bytes using the API above before doing writes, and to transform the bytes read from a handle into characters before doing "character operations" (e.g. C<lc>, C</\W+/>, ...). You can also use PerlIO to convert larger amounts of data you don't want to bring into memory. For example, to convert between ISO-8859-1 (Latin 1) and UTF-8 (or UTF-EBCDIC in EBCDIC machines): open(F, "<:encoding(iso-8859-1)", "data.txt") or die $!; open(G, ">:utf8", "data.utf") or die $!; while (<F>) { print G } # Could also do "print G <F>" but that would pull # the whole file into memory just to write it out again. More examples: open(my $f, "<:encoding(cp1252)") open(my $g, ">:encoding(iso-8859-2)") open(my $h, ">:encoding(latin9)") # iso-8859-15 See also L<encoding> for how to change the default encoding of the data in your script. =head1 How does it work? Here is a crude diagram of how filehandle, PerlIO, and Encode interact. filehandle <-> PerlIO PerlIO <-> scalar (read/printed) \ / Encode When PerlIO receives data from either direction, it fills a buffer (currently with 1024 bytes) and passes the buffer to Encode. Encode tries to convert the valid part and passes it back to PerlIO, leaving invalid parts (usually a partial character) in the buffer. PerlIO then appends more data to the buffer, calls Encode again, and so on until the data stream ends. To do so, PerlIO always calls (de|en)code methods with CHECK set to 1. This ensures that the method stops at the right place when it encounters partial character. The following is what happens when PerlIO and Encode tries to encode (from utf8) more than 1024 bytes and the buffer boundary happens to be in the middle of a character. A B C .... ~ \x{3000} .... 41 42 43 .... 7E e3 80 80 .... <- buffer ---------------> << encoded >>>>>>>>>> <- next buffer ------ Encode converts from the beginning to \x7E, leaving \xe3 in the buffer because it is invalid (partial character). Unfortunately, this scheme does not work well with escape-based encodings such as ISO-2022-JP. =head1 Line Buffering Now let's see what happens when you try to decode from ISO-2022-JP and the buffer ends in the middle of a character. JIS208-ESC \x{5f3e} A B C .... ~ \e $ B |DAN | .... 41 42 43 .... 7E 1b 24 41 43 46 .... <- buffer ---------------------------> << encoded >>>>>>>>>>>>>>>>>>>>>>> As you see, the next buffer begins with \x43. But \x43 is 'C' in ASCII, which is wrong in this case because we are now in JISX 0208 area so it has to convert \x43\x46, not \x43. Unlike utf8 and EUC, in escape-based encodings you can't tell if a given octet is a whole character or just part of it. Fortunately PerlIO also supports line buffer if you tell PerlIO to use one instead of fixed buffer. Since ISO-2022-JP is guaranteed to revert to ASCII at the end of the line, partial character will never happen when line buffer is used. To tell PerlIO to use line buffer, implement -E<gt>needs_lines method for your encoding object. See L<Encode::Encoding> for details. Thanks to these efforts most encodings that come with Encode support PerlIO but that still leaves following encodings. iso-2022-kr MIME-B MIME-Header MIME-Q Fortunately iso-2022-kr is hardly used (according to Jungshik) and MIME-* are very unlikely to be fed to PerlIO because they are for mail headers. See L<Encode::MIME::Header> for details. =head2 How can I tell whether my encoding fully supports PerlIO ? As of this writing, any encoding whose class belongs to Encode::XS and Encode::Unicode works. The Encode module has a C<perlio_ok> method which you can use before applying PerlIO encoding to the filehandle. Here is an example: my $use_perlio = perlio_ok($enc); my $layer = $use_perlio ? "<:raw" : "<:encoding($enc)"; open my $fh, $layer, $file or die "$file : $!"; while(<$fh>){ $_ = decode($enc, $_) unless $use_perlio; # .... } =head1 SEE ALSO L<Encode::Encoding>, L<Encode::Supported>, L<Encode::PerlIO>, L<encoding>, L<perlebcdic>, L<perlfunc/open>, L<perlunicode>, L<utf8>, the Perl Unicode Mailing List E<lt>perl-unicode@perl.orgE<gt> =cut PK ���Zf\�� � Changes.e2xnu �[��� # # $Id: Changes.e2x,v 2.0 2004/05/16 20:55:15 dankogai Exp $ # Revision history for Perl extension Encode::$_Name_. # 0.01 $_Now_ Autogenerated by enc2xs version $_Version_. PK ���Z�hHa a EBCDIC.pmnu �[��� package Encode::EBCDIC; use strict; use warnings; use Encode; our $VERSION = do { my @r = ( q$Revision: 2.2 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use XSLoader; XSLoader::load( __PACKAGE__, $VERSION ); 1; __END__ =head1 NAME Encode::EBCDIC - EBCDIC Encodings =head1 SYNOPSIS use Encode qw/encode decode/; $posix_bc = encode("posix-bc", $utf8); # loads Encode::EBCDIC implicitly $utf8 = decode("", $posix_bc); # ditto =head1 ABSTRACT This module implements various EBCDIC-Based encodings. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- cp37 cp500 cp875 cp1026 cp1047 posix-bc =head1 DESCRIPTION To find how to use this module in detail, see L<Encode>. =head1 SEE ALSO L<Encode>, L<perlebcdic> =cut PK ���Z1H�� � TW.pmnu �[��� package Encode::TW; BEGIN { if ( ord("A") == 193 ) { die "Encode::TW not supported on EBCDIC\n"; } } use strict; use warnings; use Encode; our $VERSION = do { my @r = ( q$Revision: 2.3 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use XSLoader; XSLoader::load( __PACKAGE__, $VERSION ); 1; __END__ =head1 NAME Encode::TW - Taiwan-based Chinese Encodings =head1 SYNOPSIS use Encode qw/encode decode/; $big5 = encode("big5", $utf8); # loads Encode::TW implicitly $utf8 = decode("big5", $big5); # ditto =head1 DESCRIPTION This module implements tradition Chinese charset encodings as used in Taiwan and Hong Kong. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- big5-eten /\bbig-?5$/i Big5 encoding (with ETen extensions) /\bbig5-?et(en)?$/i /\btca-?big5$/i big5-hkscs /\bbig5-?hk(scs)?$/i /\bhk(scs)?-?big5$/i Big5 + Cantonese characters in Hong Kong MacChineseTrad Big5 + Apple Vendor Mappings cp950 Code Page 950 = Big5 + Microsoft vendor mappings -------------------------------------------------------------------- To find out how to use this module in detail, see L<Encode>. =head1 NOTES Due to size concerns, C<EUC-TW> (Extended Unix Character), C<CCCII> (Chinese Character Code for Information Interchange), C<BIG5PLUS> (CMEX's Big5+) and C<BIG5EXT> (CMEX's Big5e) are distributed separately on CPAN, under the name L<Encode::HanExtra>. That module also contains extra China-based encodings. =head1 BUGS Since the original C<big5> encoding (1984) is not supported anywhere (glibc and DOS-based systems uses C<big5> to mean C<big5-eten>; Microsoft uses C<big5> to mean C<cp950>), a conscious decision was made to alias C<big5> to C<big5-eten>, which is the de facto superset of the original big5. The C<CNS11643> encoding files are not complete. For common C<CNS11643> manipulation, please use C<EUC-TW> in L<Encode::HanExtra>, which contains planes 1-7. The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium. =head1 SEE ALSO L<Encode> =cut PK ���Zk�QA� � CN/HZ.pmnu �[��� package Encode::CN::HZ; use strict; use warnings; use utf8 (); use vars qw($VERSION); $VERSION = do { my @r = ( q$Revision: 2.10 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use Encode qw(:fallbacks); use parent qw(Encode::Encoding); __PACKAGE__->Define('hz'); # HZ is a combination of ASCII and escaped GB, so we implement it # with the GB2312(raw) encoding here. Cf. RFCs 1842 & 1843. # not ported for EBCDIC. Which should be used, "~" or "\x7E"? sub needs_lines { 1 } sub decode ($$;$) { my ( $obj, $str, $chk ) = @_; return undef unless defined $str; my $GB = Encode::find_encoding('gb2312-raw'); my $ret = substr($str, 0, 0); # to propagate taintedness my $in_ascii = 1; # default mode is ASCII. while ( length $str ) { if ($in_ascii) { # ASCII mode if ( $str =~ s/^([\x00-\x7D\x7F]+)// ) { # no '~' => ASCII $ret .= $1; # EBCDIC should need ascii2native, but not ported. } elsif ( $str =~ s/^\x7E\x7E// ) { # escaped tilde $ret .= '~'; } elsif ( $str =~ s/^\x7E\cJ// ) { # '\cJ' == LF in ASCII 1; # no-op } elsif ( $str =~ s/^\x7E\x7B// ) { # '~{' $in_ascii = 0; # to GB } else { # encounters an invalid escape, \x80 or greater last; } } else { # GB mode; the byte ranges are as in RFC 1843. no warnings 'uninitialized'; if ( $str =~ s/^((?:[\x21-\x77][\x21-\x7E])+)// ) { my $prefix = $1; $ret .= $GB->decode( $prefix, $chk ); } elsif ( $str =~ s/^\x7E\x7D// ) { # '~}' $in_ascii = 1; } else { # invalid last; } } } $_[1] = '' if $chk; # needs_lines guarantees no partial character return $ret; } sub cat_decode { my ( $obj, undef, $src, $pos, $trm, $chk ) = @_; my ( $rdst, $rsrc, $rpos ) = \@_[ 1 .. 3 ]; my $GB = Encode::find_encoding('gb2312-raw'); my $ret = ''; my $in_ascii = 1; # default mode is ASCII. my $ini_pos = pos($$rsrc); substr( $src, 0, $pos ) = ''; my $ini_len = bytes::length($src); # $trm is the first of the pair '~~', then 2nd tilde is to be removed. # XXX: Is better C<$src =~ s/^\x7E// or die if ...>? $src =~ s/^\x7E// if $trm eq "\x7E"; while ( length $src ) { my $now; if ($in_ascii) { # ASCII mode if ( $src =~ s/^([\x00-\x7D\x7F])// ) { # no '~' => ASCII $now = $1; } elsif ( $src =~ s/^\x7E\x7E// ) { # escaped tilde $now = '~'; } elsif ( $src =~ s/^\x7E\cJ// ) { # '\cJ' == LF in ASCII next; } elsif ( $src =~ s/^\x7E\x7B// ) { # '~{' $in_ascii = 0; # to GB next; } else { # encounters an invalid escape, \x80 or greater last; } } else { # GB mode; the byte ranges are as in RFC 1843. if ( $src =~ s/^((?:[\x21-\x77][\x21-\x7F])+)// ) { $now = $GB->decode( $1, $chk ); } elsif ( $src =~ s/^\x7E\x7D// ) { # '~}' $in_ascii = 1; next; } else { # invalid last; } } next if !defined $now; $ret .= $now; if ( $now eq $trm ) { $$rdst .= $ret; $$rpos = $ini_pos + $pos + $ini_len - bytes::length($src); pos($$rsrc) = $ini_pos; return 1; } } $$rdst .= $ret; $$rpos = $ini_pos + $pos + $ini_len - bytes::length($src); pos($$rsrc) = $ini_pos; return ''; # terminator not found } sub encode($$;$) { my ( $obj, $str, $chk ) = @_; return undef unless defined $str; my $GB = Encode::find_encoding('gb2312-raw'); my $ret = substr($str, 0, 0); # to propagate taintedness; my $in_ascii = 1; # default mode is ASCII. no warnings 'utf8'; # $str may be malformed UTF8 at the end of a chunk. while ( length $str ) { if ( $str =~ s/^([[:ascii:]]+)// ) { my $tmp = $1; $tmp =~ s/~/~~/g; # escapes tildes if ( !$in_ascii ) { $ret .= "\x7E\x7D"; # '~}' $in_ascii = 1; } $ret .= pack 'a*', $tmp; # remove UTF8 flag. } elsif ( $str =~ s/(.)// ) { my $s = $1; my $tmp = $GB->encode( $s, $chk || 0 ); last if !defined $tmp; if ( length $tmp == 2 ) { # maybe a valid GB char (XXX) if ($in_ascii) { $ret .= "\x7E\x7B"; # '~{' $in_ascii = 0; } $ret .= $tmp; } elsif ( length $tmp ) { # maybe FALLBACK in ASCII (XXX) if ( !$in_ascii ) { $ret .= "\x7E\x7D"; # '~}' $in_ascii = 1; } $ret .= $tmp; } } else { # if $str is malformed UTF8 *and* if length $str != 0. last; } } $_[1] = $str if $chk; # The state at the end of the chunk is discarded, even if in GB mode. # That results in the combination of GB-OUT and GB-IN, i.e. "~}~{". # Parhaps it is harmless, but further investigations may be required... if ( !$in_ascii ) { $ret .= "\x7E\x7D"; # '~}' $in_ascii = 1; } utf8::encode($ret); # https://rt.cpan.org/Ticket/Display.html?id=35120 return $ret; } 1; __END__ =head1 NAME Encode::CN::HZ -- internally used by Encode::CN =cut PK ���Z��O1� � Config.pmnu �[��� # # Demand-load module list # package Encode::Config; our $VERSION = do { my @r = ( q$Revision: 2.5 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use strict; use warnings; our %ExtModule = ( # Encode::Byte #iso-8859-1 is in Encode.pm itself 'iso-8859-2' => 'Encode::Byte', 'iso-8859-3' => 'Encode::Byte', 'iso-8859-4' => 'Encode::Byte', 'iso-8859-5' => 'Encode::Byte', 'iso-8859-6' => 'Encode::Byte', 'iso-8859-7' => 'Encode::Byte', 'iso-8859-8' => 'Encode::Byte', 'iso-8859-9' => 'Encode::Byte', 'iso-8859-10' => 'Encode::Byte', 'iso-8859-11' => 'Encode::Byte', 'iso-8859-13' => 'Encode::Byte', 'iso-8859-14' => 'Encode::Byte', 'iso-8859-15' => 'Encode::Byte', 'iso-8859-16' => 'Encode::Byte', 'koi8-f' => 'Encode::Byte', 'koi8-r' => 'Encode::Byte', 'koi8-u' => 'Encode::Byte', 'viscii' => 'Encode::Byte', 'cp424' => 'Encode::Byte', 'cp437' => 'Encode::Byte', 'cp737' => 'Encode::Byte', 'cp775' => 'Encode::Byte', 'cp850' => 'Encode::Byte', 'cp852' => 'Encode::Byte', 'cp855' => 'Encode::Byte', 'cp856' => 'Encode::Byte', 'cp857' => 'Encode::Byte', 'cp858' => 'Encode::Byte', 'cp860' => 'Encode::Byte', 'cp861' => 'Encode::Byte', 'cp862' => 'Encode::Byte', 'cp863' => 'Encode::Byte', 'cp864' => 'Encode::Byte', 'cp865' => 'Encode::Byte', 'cp866' => 'Encode::Byte', 'cp869' => 'Encode::Byte', 'cp874' => 'Encode::Byte', 'cp1006' => 'Encode::Byte', 'cp1250' => 'Encode::Byte', 'cp1251' => 'Encode::Byte', 'cp1252' => 'Encode::Byte', 'cp1253' => 'Encode::Byte', 'cp1254' => 'Encode::Byte', 'cp1255' => 'Encode::Byte', 'cp1256' => 'Encode::Byte', 'cp1257' => 'Encode::Byte', 'cp1258' => 'Encode::Byte', 'AdobeStandardEncoding' => 'Encode::Byte', 'MacArabic' => 'Encode::Byte', 'MacCentralEurRoman' => 'Encode::Byte', 'MacCroatian' => 'Encode::Byte', 'MacCyrillic' => 'Encode::Byte', 'MacFarsi' => 'Encode::Byte', 'MacGreek' => 'Encode::Byte', 'MacHebrew' => 'Encode::Byte', 'MacIcelandic' => 'Encode::Byte', 'MacRoman' => 'Encode::Byte', 'MacRomanian' => 'Encode::Byte', 'MacRumanian' => 'Encode::Byte', 'MacSami' => 'Encode::Byte', 'MacThai' => 'Encode::Byte', 'MacTurkish' => 'Encode::Byte', 'MacUkrainian' => 'Encode::Byte', 'nextstep' => 'Encode::Byte', 'hp-roman8' => 'Encode::Byte', #'gsm0338' => 'Encode::Byte', 'gsm0338' => 'Encode::GSM0338', # Encode::EBCDIC 'cp37' => 'Encode::EBCDIC', 'cp500' => 'Encode::EBCDIC', 'cp875' => 'Encode::EBCDIC', 'cp1026' => 'Encode::EBCDIC', 'cp1047' => 'Encode::EBCDIC', 'posix-bc' => 'Encode::EBCDIC', # Encode::Symbol 'dingbats' => 'Encode::Symbol', 'symbol' => 'Encode::Symbol', 'AdobeSymbol' => 'Encode::Symbol', 'AdobeZdingbat' => 'Encode::Symbol', 'MacDingbats' => 'Encode::Symbol', 'MacSymbol' => 'Encode::Symbol', # Encode::Unicode 'UCS-2BE' => 'Encode::Unicode', 'UCS-2LE' => 'Encode::Unicode', 'UTF-16' => 'Encode::Unicode', 'UTF-16BE' => 'Encode::Unicode', 'UTF-16LE' => 'Encode::Unicode', 'UTF-32' => 'Encode::Unicode', 'UTF-32BE' => 'Encode::Unicode', 'UTF-32LE' => 'Encode::Unicode', 'UTF-7' => 'Encode::Unicode::UTF7', ); unless ( ord("A") == 193 ) { %ExtModule = ( %ExtModule, 'euc-cn' => 'Encode::CN', 'gb12345-raw' => 'Encode::CN', 'gb2312-raw' => 'Encode::CN', 'hz' => 'Encode::CN', 'iso-ir-165' => 'Encode::CN', 'cp936' => 'Encode::CN', 'MacChineseSimp' => 'Encode::CN', '7bit-jis' => 'Encode::JP', 'euc-jp' => 'Encode::JP', 'iso-2022-jp' => 'Encode::JP', 'iso-2022-jp-1' => 'Encode::JP', 'jis0201-raw' => 'Encode::JP', 'jis0208-raw' => 'Encode::JP', 'jis0212-raw' => 'Encode::JP', 'cp932' => 'Encode::JP', 'MacJapanese' => 'Encode::JP', 'shiftjis' => 'Encode::JP', 'euc-kr' => 'Encode::KR', 'iso-2022-kr' => 'Encode::KR', 'johab' => 'Encode::KR', 'ksc5601-raw' => 'Encode::KR', 'cp949' => 'Encode::KR', 'MacKorean' => 'Encode::KR', 'big5-eten' => 'Encode::TW', 'big5-hkscs' => 'Encode::TW', 'cp950' => 'Encode::TW', 'MacChineseTrad' => 'Encode::TW', #'big5plus' => 'Encode::HanExtra', #'euc-tw' => 'Encode::HanExtra', #'gb18030' => 'Encode::HanExtra', 'MIME-Header' => 'Encode::MIME::Header', 'MIME-B' => 'Encode::MIME::Header', 'MIME-Q' => 'Encode::MIME::Header', 'MIME-Header-ISO_2022_JP' => 'Encode::MIME::Header::ISO_2022_JP', ); } # # Why not export ? to keep ConfigLocal Happy! # while ( my ( $enc, $mod ) = each %ExtModule ) { $Encode::ExtModule{$enc} = $mod; } 1; __END__ =head1 NAME Encode::Config -- internally used by Encode =cut PK ���ZVzr�~ ~ Byte.pmnu �[��� package Encode::Byte; use strict; use warnings; use Encode; our $VERSION = do { my @r = ( q$Revision: 2.4 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use XSLoader; XSLoader::load( __PACKAGE__, $VERSION ); 1; __END__ =head1 NAME Encode::Byte - Single Byte Encodings =head1 SYNOPSIS use Encode qw/encode decode/; $greek = encode("iso-8859-7", $utf8); # loads Encode::Byte implicitly $utf8 = decode("iso-8859-7", $greek); # ditto =head1 ABSTRACT This module implements various single byte encodings. For most cases it uses \x80-\xff (upper half) to map non-ASCII characters. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- # ISO 8859 series (iso-8859-1 is in built-in) iso-8859-2 latin2 [ISO] iso-8859-3 latin3 [ISO] iso-8859-4 latin4 [ISO] iso-8859-5 [ISO] iso-8859-6 [ISO] iso-8859-7 [ISO] iso-8859-8 [ISO] iso-8859-9 latin5 [ISO] iso-8859-10 latin6 [ISO] iso-8859-11 (iso-8859-12 is nonexistent) iso-8859-13 latin7 [ISO] iso-8859-14 latin8 [ISO] iso-8859-15 latin9 [ISO] iso-8859-16 latin10 [ISO] # Cyrillic koi8-f koi8-r cp878 [RFC1489] koi8-u [RFC2319] # Vietnamese viscii # all cp* are also available as ibm-*, ms-*, and windows-* # also see L<http://msdn.microsoft.com/en-us/library/aa752010%28VS.85%29.aspx> cp424 cp437 cp737 cp775 cp850 cp852 cp855 cp856 cp857 cp860 cp861 cp862 cp863 cp864 cp865 cp866 cp869 cp874 cp1006 cp1250 WinLatin2 cp1251 WinCyrillic cp1252 WinLatin1 cp1253 WinGreek cp1254 WinTurkish cp1255 WinHebrew cp1256 WinArabic cp1257 WinBaltic cp1258 WinVietnamese # Macintosh # Also see L<http://developer.apple.com/technotes/tn/tn1150.html> MacArabic MacCentralEurRoman MacCroatian MacCyrillic MacFarsi MacGreek MacHebrew MacIcelandic MacRoman MacRomanian MacRumanian MacSami MacThai MacTurkish MacUkrainian # More vendor encodings AdobeStandardEncoding nextstep hp-roman8 =head1 DESCRIPTION To find how to use this module in detail, see L<Encode>. =head1 SEE ALSO L<Encode> =cut PK ���ZPhO0 0 encode.hnu �[��� #ifndef ENCODE_H #define ENCODE_H #ifndef H_PERL /* check whether we're "in perl" so that we can do data parts without getting extern references to the code parts */ typedef unsigned char U8; #endif typedef struct encpage_s encpage_t; struct encpage_s { /* fields ordered to pack nicely on 32-bit machines */ const U8 *const seq; /* Packed output sequences we generate if we match */ const encpage_t *const next; /* Page to go to if we match */ const U8 min; /* Min value of octet to match this entry */ const U8 max; /* Max value of octet to match this entry */ const U8 dlen; /* destination length - size of entries in seq */ const U8 slen; /* source length - number of source octets needed */ }; /* At any point in a translation there is a page pointer which points at an array of the above structures. Basic operation : get octet from source stream. if (octet >= min && octet < max) { if slen is 0 then we cannot represent this character. if we have less than slen octets (including this one) then we have a partial character. otherwise copy dlen octets from seq + dlen*(octet-min) to output (dlen may be zero if we don't know yet.) load page pointer with next to continue. (is slen is one this is end of a character) get next octet. } else { increment the page pointer to look at next slot in the array } arrays SHALL be constructed so there is an entry which matches ..0xFF at the end, and either maps it or indicates no representation. if MSB of slen is set then mapping is an approximate "FALLBACK" entry. */ typedef struct encode_s encode_t; struct encode_s { const encpage_t *const t_utf8; /* Starting table for translation from the encoding to UTF-8 form */ const encpage_t *const f_utf8; /* Starting table for translation from UTF-8 to the encoding */ const U8 *const rep; /* Replacement character in this encoding e.g. "?" */ int replen; /* Number of octets in rep */ U8 min_el; /* Minimum octets to represent a character */ U8 max_el; /* Maximum octets to represent a character */ const char *const name[2]; /* name(s) of this encoding */ }; #ifdef H_PERL /* See comment at top of file for deviousness */ extern int do_encode(const encpage_t *enc, const U8 *src, STRLEN *slen, U8 *dst, STRLEN dlen, STRLEN *dout, int approx, const U8 *term, STRLEN tlen); extern void Encode_DefineEncoding(encode_t *enc); #endif /* H_PERL */ #define ENCODE_NOSPACE 1 #define ENCODE_PARTIAL 2 #define ENCODE_NOREP 3 #define ENCODE_FALLBACK 4 #define ENCODE_FOUND_TERM 5 /* Use the perl core value if available; it is portable to EBCDIC */ #ifdef REPLACEMENT_CHARACTER_UTF8 # define FBCHAR_UTF8 REPLACEMENT_CHARACTER_UTF8 #else # define FBCHAR_UTF8 "\xEF\xBF\xBD" #endif #define ENCODE_DIE_ON_ERR 0x0001 /* croaks immediately */ #define ENCODE_WARN_ON_ERR 0x0002 /* warn on error; may proceed */ #define ENCODE_RETURN_ON_ERR 0x0004 /* immediately returns on NOREP */ #define ENCODE_LEAVE_SRC 0x0008 /* $src updated unless set */ #define ENCODE_PERLQQ 0x0100 /* perlqq fallback string */ #define ENCODE_HTMLCREF 0x0200 /* HTML character ref. fb mode */ #define ENCODE_XMLCREF 0x0400 /* XML character ref. fb mode */ #define ENCODE_STOP_AT_PARTIAL 0x0800 /* stop at partial explicitly */ #define ENCODE_FB_DEFAULT 0x0000 #define ENCODE_FB_CROAK 0x0001 #define ENCODE_FB_QUIET ENCODE_RETURN_ON_ERR #define ENCODE_FB_WARN (ENCODE_RETURN_ON_ERR|ENCODE_WARN_ON_ERR) #define ENCODE_FB_PERLQQ (ENCODE_PERLQQ|ENCODE_LEAVE_SRC) #define ENCODE_FB_HTMLCREF (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC) #define ENCODE_FB_XMLCREF (ENCODE_XMLCREF|ENCODE_LEAVE_SRC) #endif /* ENCODE_H */ PK ���Z8��Q� � CN.pmnu �[��� package Encode::CN; BEGIN { if ( ord("A") == 193 ) { die "Encode::CN not supported on EBCDIC\n"; } } use strict; use warnings; use Encode; our $VERSION = do { my @r = ( q$Revision: 2.3 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use XSLoader; XSLoader::load( __PACKAGE__, $VERSION ); # Relocated from Encode.pm use Encode::CN::HZ; # use Encode::CN::2022_CN; 1; __END__ =head1 NAME Encode::CN - China-based Chinese Encodings =head1 SYNOPSIS use Encode qw/encode decode/; $euc_cn = encode("euc-cn", $utf8); # loads Encode::CN implicitly $utf8 = decode("euc-cn", $euc_cn); # ditto =head1 DESCRIPTION This module implements China-based Chinese charset encodings. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-cn /\beuc.*cn$/i EUC (Extended Unix Character) /\bcn.*euc$/i /\bGB[-_ ]?2312(?:\D.*$|$)/i (see below) gb2312-raw The raw (low-bit) GB2312 character map gb12345-raw Traditional chinese counterpart to GB2312 (raw) iso-ir-165 GB2312 + GB6345 + GB8565 + additions MacChineseSimp GB2312 + Apple Additions cp936 Code Page 936, also known as GBK (Extended GuoBiao) hz 7-bit escaped GB2312 encoding -------------------------------------------------------------------- To find how to use this module in detail, see L<Encode>. =head1 NOTES Due to size concerns, C<GB 18030> (an extension to C<GBK>) is distributed separately on CPAN, under the name L<Encode::HanExtra>. That module also contains extra Taiwan-based encodings. =head1 BUGS When you see C<charset=gb2312> on mails and web pages, they really mean C<euc-cn> encodings. To fix that, C<gb2312> is aliased to C<euc-cn>. Use C<gb2312-raw> when you really mean it. The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium. =head1 SEE ALSO L<Encode> =cut PK ���Z"0k^# # Unicode/UTF7.pmnu �[��� # # $Id: UTF7.pm,v 2.10 2017/06/10 17:23:50 dankogai Exp $ # package Encode::Unicode::UTF7; use strict; use warnings; use parent qw(Encode::Encoding); __PACKAGE__->Define('UTF-7'); our $VERSION = do { my @r = ( q$Revision: 2.10 $ =~ /\d+/g ); sprintf "%d." . "%02d" x $#r, @r }; use MIME::Base64; use Encode qw(find_encoding); # # Algorithms taken from Unicode::String by Gisle Aas # our $OPTIONAL_DIRECT_CHARS = 1; my $specials = quotemeta "\'(),-./:?"; $OPTIONAL_DIRECT_CHARS and $specials .= quotemeta "!\"#$%&*;<=>@[]^_`{|}"; # \s will not work because it matches U+3000 DEOGRAPHIC SPACE # We use qr/[\n\r\t\ ] instead my $re_asis = qr/(?:[\n\r\t\ A-Za-z0-9$specials])/; my $re_encoded = qr/(?:[^\n\r\t\ A-Za-z0-9$specials])/; my $e_utf16 = find_encoding("UTF-16BE"); sub needs_lines { 1 } sub encode($$;$) { my ( $obj, $str, $chk ) = @_; return undef unless defined $str; my $len = length($str); pos($str) = 0; my $bytes = substr($str, 0, 0); # to propagate taintedness while ( pos($str) < $len ) { if ( $str =~ /\G($re_asis+)/ogc ) { my $octets = $1; utf8::downgrade($octets); $bytes .= $octets; } elsif ( $str =~ /\G($re_encoded+)/ogsc ) { if ( $1 eq "+" ) { $bytes .= "+-"; } else { my $s = $1; my $base64 = encode_base64( $e_utf16->encode($s), '' ); $base64 =~ s/=+$//; $bytes .= "+$base64-"; } } else { die "This should not happen! (pos=" . pos($str) . ")"; } } $_[1] = '' if $chk; return $bytes; } sub decode($$;$) { use re 'taint'; my ( $obj, $bytes, $chk ) = @_; return undef unless defined $bytes; my $len = length($bytes); my $str = substr($bytes, 0, 0); # to propagate taintedness; pos($bytes) = 0; no warnings 'uninitialized'; while ( pos($bytes) < $len ) { if ( $bytes =~ /\G([^+]+)/ogc ) { $str .= $1; } elsif ( $bytes =~ /\G\+-/ogc ) { $str .= "+"; } elsif ( $bytes =~ /\G\+([A-Za-z0-9+\/]+)-?/ogsc ) { my $base64 = $1; my $pad = length($base64) % 4; $base64 .= "=" x ( 4 - $pad ) if $pad; $str .= $e_utf16->decode( decode_base64($base64) ); } elsif ( $bytes =~ /\G\+/ogc ) { $^W and warn "Bad UTF7 data escape"; $str .= "+"; } else { die "This should not happen " . pos($bytes); } } $_[1] = '' if $chk; return $str; } 1; __END__ =head1 NAME Encode::Unicode::UTF7 -- UTF-7 encoding =head1 SYNOPSIS use Encode qw/encode decode/; $utf7 = encode("UTF-7", $utf8); $utf8 = decode("UTF-7", $ucs2); =head1 ABSTRACT This module implements UTF-7 encoding documented in RFC 2152. UTF-7, as its name suggests, is a 7-bit re-encoded version of UTF-16BE. It is designed to be MTA-safe and expected to be a standard way to exchange Unicoded mails via mails. But with the advent of UTF-8 and 8-bit compliant MTAs, UTF-7 is hardly ever used. UTF-7 was not supported by Encode until version 1.95 because of that. But Unicode::String, a module by Gisle Aas which adds Unicode supports to non-utf8-savvy perl did support UTF-7, the UTF-7 support was added so Encode can supersede Unicode::String 100%. =head1 In Practice When you want to encode Unicode for mails and web pages, however, do not use UTF-7 unless you are sure your recipients and readers can handle it. Very few MUAs and WWW Browsers support these days (only Mozilla seems to support one). For general cases, use UTF-8 for message body and MIME-Header for header instead. =head1 SEE ALSO L<Encode>, L<Encode::Unicode>, L<Unicode::String> RFC 2781 L<http://www.ietf.org/rfc/rfc2152.txt> =cut PK ���Z0S�ˁ'