RFC 2231: MIME Parameter Value Extensions
Why This Exists
MIME parameters appear in headers like Content-Type and Content-Disposition. The most common use case is the filename parameter for attachments:
Content-Disposition: attachment; filename="report.pdf"
This works fine for ASCII filenames. But what about a file named 報告書.pdf (Japanese for "report") or a filename with 200 characters? The original MIME specification (RFC 2045) had no mechanism for this. RFC 2231 solves three problems:
- Character sets: encoding non-ASCII characters in parameter values
- Language tags: annotating parameter values with a language identifier
- Continuations: splitting long parameter values across multiple lines
How It Works
Character Set and Language Encoding
To include non-ASCII characters, append an asterisk to the parameter name and use the format charset'language'encoded-value:
Content-Disposition: attachment; filename*=UTF-8''%E5%A0%B1%E5%91%8A%E6%9B%B8.pdf ^^^^^^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^ charset lang percent-encoded value (empty = no language tag)
The value %E5%A0%B1%E5%91%8A%E6%9B%B8 is the UTF-8 bytes for "報告書" percent-encoded. The language tag between the single quotes is optional (often left empty).
Continuations for Long Values
When a parameter value is too long for a single header line, split it using numbered continuations:
Content-Type: application/pdf; filename*0="very-long-document-name-that-exceeds-the"; filename*1="-reasonable-line-length-limit-for-headers.pdf"
The parts are reassembled in numeric order: *0, *1, *2, etc.
Combined: Continuations with Character Sets
For long non-ASCII values, combine both features. Only the first segment includes the charset and language; subsequent segments are just encoded values:
Content-Disposition: attachment; filename*0*=UTF-8''%E3%81%93%E3%82%8C%E3%81%AF%E9%95%B7; filename*1*=%E3%81%84%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB; filename*2*=%E5%90%8D.pdf ^^^^ number + asterisk = encoded continuation
Key Technical Details
Parameter Name Syntax
| Form | Meaning | Example |
|---|---|---|
filename |
Plain ASCII value | filename="report.pdf" |
filename* |
Encoded value (charset'lang'value) | filename*=UTF-8''%E5%A0%B1.pdf |
filename*0 |
Continuation part 0, plain ASCII | filename*0="very-long-" |
filename*0* |
Continuation part 0, encoded | filename*0*=UTF-8''%E5%A0%B1 |
Encoding Rules
- Use percent-encoding (like URL encoding) for non-ASCII octets
- ASCII characters that are safe in MIME tokens (letters, digits, hyphens, periods) do not need encoding
- The charset is almost always
UTF-8in modern practice - The language tag follows BCP 47 (e.g.,
en,ja,zh-CN) but is often omitted
Interaction with RFC 2047
RFC 2047 provides encoded-words (=?UTF-8?B?...?=) for non-ASCII text in headers. However, RFC 2047 explicitly states that encoded-words must NOT appear inside quoted strings or parameter values. RFC 2231 is the correct mechanism for MIME parameters. Despite this rule, many mail clients use RFC 2047 in filename parameters anyway, so robust parsers must handle both.
Common Mistakes
-
Using RFC 2047 encoding in parameter values. Writing
filename="=?UTF-8?B?...?="is technically invalid, but widespread. For maximum compatibility, send both: a plainfilenamewith ASCII fallback and afilename*with the RFC 2231 encoded value. -
Forgetting the double-asterisk for encoded continuations.
filename*1is a plain text continuation.filename*1*is an encoded continuation. Missing the trailing asterisk means the value will be treated as literal text, not decoded. - Wrong continuation order. Parts must be numbered sequentially starting at 0. Gaps in numbering (0, 1, 3) or starting at 1 instead of 0 will cause parsing failures in strict implementations.
- Using a charset other than UTF-8. While RFC 2231 allows any IANA charset, modern practice is UTF-8 exclusively. Using ISO-8859-1 or Windows-1252 reduces interoperability with international recipients.
-
Not providing an ASCII fallback. Some older email clients do not support RFC 2231. Always include a plain
filenameparameter with an ASCII approximation alongside thefilename*parameter.
Deliverability Impact
- Attachment filename display. Incorrect encoding causes attachment filenames to appear as garbled text or question marks in the recipient's mail client. This is confusing and unprofessional, though it does not directly affect whether the message is delivered.
- Spam filter triggers. Malformed MIME parameters can trigger spam filters. Some filters flag messages with encoding anomalies as suspicious, since malware campaigns sometimes use malformed headers to exploit parser vulnerabilities.
-
Interoperability across clients. The real-world state of RFC 2231 support varies. Gmail, Apple Mail, and Thunderbird handle it well. Some older enterprise clients do not. The safest approach is to always include both
filename(ASCII) andfilename*(RFC 2231) parameters. -
Content-Type parameters too. While
filenameis the most visible use case, RFC 2231 also applies toContent-Typeparameters likenameandcharset, and any other MIME header parameter.