← RFC Reference

RFC 2047: MIME Part 3 — Message Header Extensions for Non-ASCII Text

Standards Track MIME — Multipurpose Internet Mail Extensions Published November 1996
ELI5: Email headers like Subject and From are restricted to plain ASCII — English letters and basic punctuation. RFC 2047 is the trick that lets you write a Subject line in Japanese, a sender name in Arabic, or an accented name in French. It encodes non-ASCII characters into an ASCII-safe wrapper that any mail server can transport, and mail clients decode it back for display.

Why This Exists

RFC 2045 and RFC 2046 solved the problem of carrying non-ASCII content in message bodies with Content-Transfer-Encoding. But email headers — Subject, From display names, To display names, and others — are governed by RFC 5322, which restricts them to 7-bit US-ASCII.

This creates an obvious problem: billions of email users write in languages that require non-ASCII characters. Without RFC 2047, you could not send an email with:

RFC 2047 defines the encoded-word syntax: a compact way to embed non-ASCII text inside ASCII-only headers, readable by any MIME-aware mail client.

How It Works

The Encoded-Word Format

An encoded-word has this structure:

=?charset?encoding?encoded-text?=

The three components are:

Component Purpose Values
charset The character set of the original text UTF-8, ISO-8859-1, ISO-2022-JP, etc.
encoding How the text is encoded into ASCII B (base64) or Q (quoted-printable variant)
encoded-text The encoded representation ASCII characters only

B Encoding (Base64)

Uses standard base64 encoding. Best for text that is heavily non-ASCII, such as CJK scripts:

; Subject: "Meeting confirmation" in Japanese
Subject: =?UTF-8?B?5Lya6K2w44Gu56K66KqN?=

; From display name in Chinese
From: =?UTF-8?B?5byg5LiJ?= <zhang@example.com>

Q Encoding (Quoted-Printable Variant)

A modified quoted-printable encoding optimized for headers. Like body QP, non-ASCII bytes become =XX hex pairs. Key difference: spaces are encoded as underscores (_):

; Subject: "Café menu" with accented e
Subject: =?UTF-8?Q?Caf=C3=A9_menu?=

; From display name: "René Dupont"
From: =?UTF-8?Q?Ren=C3=A9_Dupont?= <rene@example.com>

; Subject: "Gruße aus Berlin" (German greetings)
Subject: =?UTF-8?Q?Gru=C3=9Fe_aus_Berlin?=

Q encoding is more human-readable when most of the text is ASCII with just a few non-ASCII characters. B encoding is more compact when most characters are non-ASCII.

Where Encoded-Words Can Appear

Encoded-words are allowed in specific positions within headers:

Encoded-words are not allowed inside quoted-strings, in the local-part or domain of an email address, or as parameter values in structured headers like Content-Type (use RFC 2231 for that).

Key Technical Details

Length Limits

Each encoded-word must not exceed 75 characters. If the encoded text is longer, it must be split into multiple encoded-words separated by folding whitespace (CRLF + space or tab):

; Long subject split across two encoded-words
Subject: =?UTF-8?B?5LuK5pel44Gu5Lya6K2w44Gr44Gk44GE44Gm?=
 =?UTF-8?B?44GU5qGI5YaF44GE44Gf44GX44G+44GZ?=

When two adjacent encoded-words are separated only by linear whitespace, the whitespace between them is ignored during decoding. This allows seamless splitting of long text across multiple encoded-words.

Charset Selection

Always use UTF-8 for new messages. The other charsets exist for legacy reasons:

Charset Use Case Recommendation
UTF-8 Covers all Unicode characters Always use this
ISO-8859-1 Western European legacy Do not use in new messages
ISO-2022-JP Japanese legacy encoding Still seen from some Japanese mail clients
GB2312 Simplified Chinese legacy Do not use in new messages

Interaction with Header Folding

RFC 5322 limits header lines to 998 characters and recommends keeping them under 78. Encoded-words interact with folding: you can break between encoded-words at whitespace boundaries, but you must never break in the middle of an encoded-word. The =?...?= wrapper must be on a single line.

Decoding Rules

When a mail client encounters an encoded-word, it:

  1. Extracts the charset, encoding type, and encoded text from the =?charset?encoding?text?= wrapper.
  2. Decodes the text using base64 (B) or quoted-printable (Q).
  3. Interprets the resulting bytes according to the declared charset.
  4. Displays the decoded Unicode text to the user.

If the client does not recognize the charset, it should display the encoded-word as-is rather than displaying garbled text.

Examples

A Complete Message with Encoded Headers

MIME-Version: 1.0
From: =?UTF-8?Q?Ren=C3=A9_Dupont?= <rene@example.fr>
To: =?UTF-8?B?5bGx55Sw5aSq6YOO?= <yamada@example.jp>
Subject: =?UTF-8?Q?Re:_R=C3=A9union_du_15_mars?=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Bonjour Taro,

Confirmons la r=C3=A9union pour le 15 mars.

Note: the header uses RFC 2047 encoded-words (=?...?=), while the body uses regular quoted-printable encoding (=XX without the wrapper). These are different mechanisms for different parts of the message.

Encoding Comparison

The same text — "München" — encoded both ways:

; Q encoding: readable, good for mostly-ASCII text
=?UTF-8?Q?M=C3=BCnchen?=

; B encoding: compact but opaque
=?UTF-8?B?TcO8bmNoZW4=?=

Common Mistakes

Deliverability Impact

Related RFCs