Methods
Constants
NORMALIZATION_FORMS | = | [:c, :kc, :d, :kd] |
A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization. |
||
UNICODE_VERSION | = | RbConfig::CONFIG["UNICODE_VERSION"] |
The |
Attributes
[RW] | default_normalization_form | The default normalization used for operations that require normalization. It can be set to any of the normalizations in
|
Instance Public methods
compose(codepoints)
Compose decomposed characters to the composed form.
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 67
def compose(codepoints)
codepoints.pack("U*").unicode_normalize(:nfc).codepoints
end
๐ See on GitHub
decompose(type, codepoints)
Decompose composed characters to the decomposed form.
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 58
def decompose(type, codepoints)
if type == :compatibility
codepoints.pack("U*").unicode_normalize(:nfkd).codepoints
else
codepoints.pack("U*").unicode_normalize(:nfd).codepoints
end
end
๐ See on GitHub
normalize(string, form = nil)
Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.
-
string
- The string to perform normalization on. -
form
- The form you want to normalize in. Should be one of the following::c
,:kc
,:d
, or:kd
. Default isActiveSupport::Multibyte::Unicode.default_normalization_form
.
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 118
def normalize(string, form = nil)
form ||= @default_normalization_form
# See https://www.unicode.org/reports/tr15, Table 1
if alias_form = NORMALIZATION_FORM_ALIASES[form]
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
removed from Rails 6.1. Use String#unicode_normalize(:#{alias_form}) instead.
MSG
string.unicode_normalize(alias_form)
else
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
removed from Rails 6.1. Use String#unicode_normalize instead.
MSG
raise ArgumentError, "#{form} is not a valid normalization variant", caller
end
end
๐ See on GitHub
pack_graphemes(unpacked)
Reverse operation of unpack_graphemes.
Unicode.pack_graphemes(Unicode.unpack_graphemes('เคเฅเคทเคฟ')) # => 'เคเฅเคทเคฟ'
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 48
def pack_graphemes(unpacked)
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#pack_graphemes is deprecated and will be
removed from Rails 6.1. Use array.flatten.pack("U*") instead.
MSG
unpacked.flatten.pack("U*")
end
๐ See on GitHub
tidy_bytes(string, force = false)
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true
will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 78
def tidy_bytes(string, force = false)
return string if string.empty?
return recode_windows1252_chars(string) if force
string.scrub { |bad| recode_windows1252_chars(bad) }
end
๐ See on GitHub
unpack_graphemes(string)
Unpack the string at grapheme boundaries. Returns a list of character lists.
Unicode.unpack_graphemes('เคเฅเคทเคฟ') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Cafรฉ') # => [[67], [97], [102], [233]]
๐ Source code
# File activesupport/lib/active_support/multibyte/unicode.rb, line 36
def unpack_graphemes(string)
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#unpack_graphemes is deprecated and will be
removed from Rails 6.1. Use string.scan(/\X/).map(&:codepoints) instead.
MSG
string.scan(/\X/).map(&:codepoints)
end
๐ See on GitHub