Primitive Type char [−]
A character type.
The char
type represents a single character. More specifically, since
'character' isn't a well-defined concept in Unicode, char
is a 'Unicode
scalar value', which is similar to, but not the same as, a 'Unicode code
point'.
This documentation describes a number of methods and trait implementations on the
char
type. For technical reasons, there is additional, separate
documentation in the std::char
module as well.
Representation
char
is always four bytes in size. This is a different representation than
a given character would have as part of a String
. For example:
let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>());
As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' can be more than one Unicode code point; this ❤️ in particular is two:
fn main() { let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next()); }let s = String::from("❤️"); // we get two chars out of a single ❤️ let mut iter = s.chars(); assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next());
This means it won't fit into a char
. Trying to create a literal with
let heart = '❤️';
gives an error:
error: character literal may only contain one codepoint: '❤
let heart = '❤️';
^~
Another implication of the 4-byte fixed size of a char
is that
per-char
processing can end up using a lot more memory:
let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>());
Methods
impl char
fn is_digit(self, radix: u32) -> bool
1.0.0
Checks if a char
is a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radicum are supported.
Compared to is_numeric()
, this function only recognizes the characters
0-9
, a-z
and A-Z
.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
For a more comprehensive understanding of 'digit', see is_numeric()
.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10)); }assert!('1'.is_digit(10)); assert!('f'.is_digit(16)); assert!(!'f'.is_digit(10));
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { // this panics '1'.is_digit(37); }).join(); assert!(result.is_err());
fn to_digit(self, radix: u32) -> Option<u32>
1.0.0
Converts a char
to a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexadecimal, to give some common values. Arbitrary radicum are supported.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
Errors
Returns None
if the char
does not refer to a digit in the given radix.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15)); }assert_eq!('1'.to_digit(10), Some(1)); assert_eq!('f'.to_digit(16), Some(15));
Passing a non-digit results in failure:
fn main() { assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None); }assert_eq!('f'.to_digit(10), None); assert_eq!('z'.to_digit(16), None);
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { '1'.to_digit(37); }).join(); assert!(result.is_err());
fn escape_unicode(self) -> EscapeUnicode
1.0.0
Returns an iterator that yields the hexadecimal Unicode escape of a
character, as char
s.
All characters are escaped with Rust syntax of the form \\u{NNNN}
where NNNN
is the shortest hexadecimal representation.
Examples
Basic usage:
fn main() { for c in '❤'.escape_unicode() { print!("{}", c); } println!(""); }for c in '❤'.escape_unicode() { print!("{}", c); } println!("");
This prints:
\u{2764}
Collecting into a String
:
let heart: String = '❤'.escape_unicode().collect(); assert_eq!(heart, r"\u{2764}");
fn escape_default(self) -> EscapeDefault
1.0.0
Returns an iterator that yields the literal escape code of a char
.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab is escaped as
\t
. - Carriage return is escaped as
\r
. - Line feed is escaped as
\n
. - Single quote is escaped as
\'
. - Double quote is escaped as
\"
. - Backslash is escaped as
\\
. - Any character in the 'printable ASCII' range
0x20
..0x7e
inclusive is not escaped. - All other characters are given hexadecimal Unicode escapes; see
escape_unicode
.
Examples
Basic usage:
fn main() { for i in '"'.escape_default() { println!("{}", i); } }for i in '"'.escape_default() { println!("{}", i); }
This prints:
\
"
Collecting into a String
:
let quote: String = '"'.escape_default().collect(); assert_eq!(quote, "\\\"");
fn len_utf8(self) -> usize
1.0.0
Returns the number of bytes this char
would need if encoded in UTF-8.
That number of bytes is always between 1 and 4, inclusive.
Examples
Basic usage:
fn main() { let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4); }let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4);
The &str
type guarantees that its contents are UTF-8, and so we can compare the length it
would take if each code point was represented as a char
vs in the &str
itself:
// as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len());
fn len_utf16(self) -> usize
1.0.0
Returns the number of 16-bit code units this char
would need if
encoded in UTF-16.
See the documentation for len_utf8()
for more explanation of this
concept. This function is a mirror, but for UTF-16 instead of UTF-8.
Examples
Basic usage:
fn main() { let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2); }let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2);
fn encode_utf8(self) -> EncodeUtf8
unicode
#27784)Returns an interator over the bytes of this character as UTF-8.
The returned iterator also has an as_slice()
method to view the
encoded bytes as a byte slice.
Examples
#![feature(unicode)] fn main() { let iterator = 'ß'.encode_utf8(); assert_eq!(iterator.as_slice(), [0xc3, 0x9f]); for (i, byte) in iterator.enumerate() { println!("byte {}: {:x}", i, byte); } }#![feature(unicode)] let iterator = 'ß'.encode_utf8(); assert_eq!(iterator.as_slice(), [0xc3, 0x9f]); for (i, byte) in iterator.enumerate() { println!("byte {}: {:x}", i, byte); }
fn encode_utf16(self) -> EncodeUtf16
unicode
#27784)Returns an interator over the u16
entries of this character as UTF-16.
The returned iterator also has an as_slice()
method to view the
encoded form as a slice.
Examples
#![feature(unicode)] fn main() { let iterator = '𝕊'.encode_utf16(); assert_eq!(iterator.as_slice(), [0xd835, 0xdd4a]); for (i, val) in iterator.enumerate() { println!("entry {}: {:x}", i, val); } }#![feature(unicode)] let iterator = '𝕊'.encode_utf16(); assert_eq!(iterator.as_slice(), [0xd835, 0xdd4a]); for (i, val) in iterator.enumerate() { println!("entry {}: {:x}", i, val); }
fn is_alphabetic(self) -> bool
1.0.0
Returns true if this char
is an alphabetic code point, and false if not.
Examples
Basic usage:
fn main() { assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic()); }assert!('a'.is_alphabetic()); assert!('京'.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic());
fn is_xid_start(self) -> bool
unicode
): mainly needed for compiler internals
Returns true if this char
satisfies the 'XID_Start' Unicode property, and false
otherwise.
'XID_Start' is a Unicode Derived Property specified in
UAX #31,
mostly similar to ID_Start
but modified for closure under NFKx
.
fn is_xid_continue(self) -> bool
unicode
): mainly needed for compiler internals
Returns true if this char
satisfies the 'XID_Continue' Unicode property, and false
otherwise.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
1.0.0
Returns true if this char
is lowercase, and false otherwise.
'Lowercase' is defined according to the terms of the Unicode Derived Core
Property Lowercase
.
Examples
Basic usage:
fn main() { assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase()); }assert!('a'.is_lowercase()); assert!('δ'.is_lowercase()); assert!(!'A'.is_lowercase()); assert!(!'Δ'.is_lowercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_lowercase());
fn is_uppercase(self) -> bool
1.0.0
Returns true if this char
is uppercase, and false otherwise.
'Uppercase' is defined according to the terms of the Unicode Derived Core
Property Uppercase
.
Examples
Basic usage:
fn main() { assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase()); }assert!(!'a'.is_uppercase()); assert!(!'δ'.is_uppercase()); assert!('A'.is_uppercase()); assert!('Δ'.is_uppercase()); // The various Chinese scripts do not have case, and so: assert!(!'中'.is_uppercase());
fn is_whitespace(self) -> bool
1.0.0
Returns true if this char
is whitespace, and false otherwise.
'Whitespace' is defined according to the terms of the Unicode Derived Core
Property White_Space
.
Examples
Basic usage:
fn main() { assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace()); }assert!(' '.is_whitespace()); // a non-breaking space assert!('\u{A0}'.is_whitespace()); assert!(!'越'.is_whitespace());
fn is_alphanumeric(self) -> bool
1.0.0
Returns true if this char
is alphanumeric, and false otherwise.
'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
Examples
Basic usage:
fn main() { assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric()); }assert!('٣'.is_alphanumeric()); assert!('7'.is_alphanumeric()); assert!('৬'.is_alphanumeric()); assert!('K'.is_alphanumeric()); assert!('و'.is_alphanumeric()); assert!('藏'.is_alphanumeric()); assert!(!'¾'.is_alphanumeric()); assert!(!'①'.is_alphanumeric());
fn is_control(self) -> bool
1.0.0
Returns true if this char
is a control code point, and false otherwise.
'Control code point' is defined in terms of the Unicode General
Category Cc
.
Examples
Basic usage:
fn main() { // U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control()); }// U+009C, STRING TERMINATOR assert!(''.is_control()); assert!(!'q'.is_control());
fn is_numeric(self) -> bool
1.0.0
Returns true if this char
is numeric, and false otherwise.
'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.
Examples
Basic usage:
fn main() { assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric()); }assert!('٣'.is_numeric()); assert!('7'.is_numeric()); assert!('৬'.is_numeric()); assert!(!'K'.is_numeric()); assert!(!'و'.is_numeric()); assert!(!'藏'.is_numeric()); assert!(!'¾'.is_numeric()); assert!(!'①'.is_numeric());
fn to_lowercase(self) -> ToLowercase
1.0.0
Returns an iterator that yields the lowercase equivalent of a char
.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its lowercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt
. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { assert_eq!('C'.to_lowercase().next(), Some('c')); // Japanese scripts do not have case, and so: assert_eq!('山'.to_lowercase().next(), Some('山')); }assert_eq!('C'.to_lowercase().next(), Some('c')); // Japanese scripts do not have case, and so: assert_eq!('山'.to_lowercase().next(), Some('山'));
fn to_uppercase(self) -> ToUppercase
1.0.0
Returns an iterator that yields the uppercase equivalent of a char
.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its uppercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt
. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { assert_eq!('c'.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: assert_eq!('山'.to_uppercase().next(), Some('山')); }assert_eq!('c'.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: assert_eq!('山'.to_uppercase().next(), Some('山'));
In Turkish, the equivalent of 'i' in Latin has five forms instead of two:
- 'Dotless': I / ı, sometimes written ï
- 'Dotted': İ / i
Note that the lowercase dotted 'i' is the same as the Latin. Therefore:
fn main() { let upper_i = 'i'.to_uppercase().next(); }let upper_i = 'i'.to_uppercase().next();
The value of upper_i
here relies on the language of the text: if we're
in en-US
, it should be Some('I')
, but if we're in tr_TR
, it should
be Some('İ')
. to_uppercase()
does not take this into account, and so:
let upper_i = 'i'.to_uppercase().next(); assert_eq!(Some('I'), upper_i);
holds across languages.
Trait Implementations
impl Display for char
1.0.0
impl Debug for char
1.0.0
impl Hash for char
1.0.0
fn hash<H>(&self, state: &mut H) where H: Hasher
fn hash_slice<H>(data: &[Self], state: &mut H) where H: Hasher
1.3.0
impl<'a> Pattern<'a> for char
Searches for chars that are equal to a given char
type Searcher = CharSearcher<'a>
fn into_searcher(self, haystack: &'a str) -> CharSearcher<'a>
fn is_contained_in(self, haystack: &'a str) -> bool
fn is_prefix_of(self, haystack: &'a str) -> bool
fn is_suffix_of(self, haystack: &'a str) -> bool where CharSearcher<'a>: ReverseSearcher<'a>
impl Default for char
1.0.0
impl Clone for char
1.0.0
fn clone(&self) -> char
Returns a deep copy of the value.