犹豫了挺久要不要把这篇文章放上来,猛然发现距离上一篇文章已经过了一个月了,那就拿这篇文章来凑凑数吧,反正也是一个偏门的小众领域
Unicode是什么
对于unicode是什么,参见 https://zh.wikipedia.org/wiki/Unicode
对于unicode在域名上的规范(即IDN),参见 https://www.unicode.org/faq/idn.html
Unicode的安全隐患
Unicode带来的最大的安全隐患就是欺骗,或者说 spoofing
A common security issue is ‘spoofing’, the deliberate misspelling of a domain or user name to trick unaware users into entering an interaction with a hostile site as if it was a trusted site.
许多不同的unicode字符在字形上十分相似,例如一个正常人无法分别出аррӏе.com和apple.com的差别(当然你在这里会发现有一丝丝的差别,这是因为有对比,而在地址栏中没有对比的情况下,几乎没人可以发觉不同之处,这其实也是一个可以利用的地方,你记忆中的字形和呈现在地址栏中的没有与之对比的相似字形,往往是无法分辨的),而实际上,他们的每一个字符都是截然不同的,第一个аррӏе,是全由西里尔字符组成的。
1 2 3 4 5 |
>>> 'аррӏе'.decode("utf-8") u'\u0430\u0440\u0440\u04cf\u0435' >>> 'apple'.decode("utf-8") u'apple' |
虽然欺骗是unicode最常出现的问题,但需要注意的是,在某些情况下,unicode可能会造成更大的危害
例如这一个十分有趣的例子:https://labs.spotify.com/2013/06/18/creative-usernames/
问题很简单,一个网站允许用unicode作为用户名,他使用一个python库来判断用户名是否幂等
1 2 3 4 5 |
from twisted.words.protocols.jabber.xmpp_stringprep import nodeprep def canonical_username(name): return nodeprep.prepare(name) canonical_username(u'\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30') |
这个函数会把大写转换为小写,把类似的unicode字符做一个与chrome的地址栏里相似的转换,举个例子
BIG
会被转换为big
, ƁƗƓ
会被转换为ɓɨɠ
他们对用户名是否重复的判断是执行一次这个函数然后进行比对 ,例如AAA会被变为aaa则和之前已经注册过的aaa重复 ,但是这里出现了一个错误,注册一个ᴬᴬᴬ
,经过函数处理后变成了AAA,因为与aaa不同所以注册成功,而在用户点击重置密码的连接的时候,这个函数再次被执行了一次,AAA变成了aaa,导致用户aaa的密码被越权修改
文章末尾的一些总结:
- 必须了解用户输入的究竟是什么,经过层层剥离后又是什么
- 如果要支持unicode,有非常多的陷阱
在这个例子中可以看到的是,网站的开发者对unicode机制的错误处理导致了一个严重逻辑漏洞的产生,但这种例子毕竟是少数,unicode更多的安全危害还是来自于spoofing,而这点在域名中被应用到了极致,试想一个钓鱼网站的域名与真正的网站域名字形完全相同或者相似会带来怎样的危害
IDN spoofing
在对域名unicode欺骗的处理上,Chrome目前较为领先,当然我认为他们有一点过于积极了,导致了一种宁可错杀一千也不放过一个的现状。unicode spoof与传统的url spoof最为显著的区别就是他的不好界定性,因为字体、操作系统的不同,使得一个字符在windows的chrome上可以和mac上的有着完全不同的字形,而到了移动端甚至又会发生变化。为了在一定程度上缓解这样的情况,Chrome维护了一个top_domain_list,把一些有知名度的域名放进去,如果一个域名被认定为与top_domians中的域名相似,则强制其在地址栏中显示为punnycode,而如果能绕过这重重机制,则被认为是一个unicode spoof漏洞,虽然我现在仍然认为,top_domains 还是有一点太多了。
那我们就来看看,Chrome是如何处理这些unicode欺骗的吧。
Chrome的idn_spoof_checker
如果需要探讨Chrome的unicode spoof问题,理解其规则是十分重要的,其中,对unicode spoof的判断大多数在idn_spoof_checker.cc这个文件里,我们把第一个断点断在IDNSpoofChecker::IDNSpoofChecker
,往上回溯调用链:
1 2 3 4 5 6 7 |
url_formatter::FormatUrl() url_formatter::FormatUrlWithAdjustments() AppendFormattedComponent IDNToUnicodeWithAdjustments IDNToUnicodeOneComponent IsIDNComponentSafe |
其中IsIDNComponentSafe
函数的注释:
1 2 3 4 5 6 7 8 |
// Returns true if the given Unicode host component is safe to display to the // user. Note that this function does not deal with pure ASCII domain labels at // all even though it's possible to make up look-alike labels with ASCII // characters alone. bool IsIDNComponentSafe(base::StringPiece16 label, bool is_tld_ascii) { return g_idn_spoof_checker.Get().SafeToDisplayAsUnicode(label, is_tld_ascii); } |
可以注意到的一点是,整个调用链中url都是以punnycode显示的,经过IDNSpoofChecker之后再决定是否以unicode显示,这是非常正确的,因为就算你绕过了整个checker函数,最后呈现的也一定是punnycode
接着进入IDNSpoofChecker函数,查看所有的判断规则
1.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// At this point, USpoofChecker has all the checks enabled except // for USPOOF_CHAR_LIMIT (USPOOF_{RESTRICTION_LEVEL, INVISIBLE, // MIXED_SCRIPT_CONFUSABLE, WHOLE_SCRIPT_CONFUSABLE, MIXED_NUMBERS, ANY_CASE}) // This default configuration is adjusted below as necessary. // Set the restriction level to high. It allows mixing Latin with one logical // CJK script (+ COMMON and INHERITED), but does not allow any other script // mixing (e.g. Latin + Cyrillic, Latin + Armenian, Cyrillic + Greek). Note // that each of {Han + Bopomofo} for Chinese, {Hiragana, Katakana, Han} for // Japanese, and {Hangul, Han} for Korean is treated as a single logical // script. // See http://www.unicode.org/reports/tr39/#Restriction_Level_Detection uspoof_setRestrictionLevel(checker_, USPOOF_HIGHLY_RESTRICTIVE); |
关于字符的混合,只允许拉丁字符与一个CJK(中日韩)字符混合,其余所有字符的混合都不允许
2.
1 2 3 |
// Sets allowed characters in IDN labels and turns on USPOOF_CHAR_LIMIT. SetAllowedUnicodeSet(&status); |
跟进去,这一个函数移除了所有不允许出现在url中的字符,
首先添加了unicode规范中建议移除的字符
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// The recommended set is a set of characters for identifiers in a // security-sensitive environment taken from UTR 39 // (http://unicode.org/reports/tr39/) and // http://www.unicode.org/Public/security/latest/xidmodifications.txt . // The inclusion set comes from "Candidate Characters for Inclusion // in idenfiers" of UTR 31 (http://www.unicode.org/reports/tr31). The list // may change over the time and will be updated whenever the version of ICU // used in Chromium is updated. const icu::UnicodeSet* recommended_set = uspoof_getRecommendedUnicodeSet(status); icu::UnicodeSet allowed_set; allowed_set.addAll(*recommended_set); const icu::UnicodeSet* inclusion_set = uspoof_getInclusionUnicodeSet(status); allowed_set.addAll(*inclusion_set); |
移除U+0338,因为他看起来类似/
移除U+2010,因为他容易与-混淆
移除U+2019(’),因为他与其他字符相邻时容易被忽略
移除U+2027(‧),容易与·混淆
移除U+8231,同上
移除U+12448,与=混淆
移除U+699和U+700,与单引号和双引号混淆
然后是移除macos下会渲染成空白的字符:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#if defined(OS_MACOSX) // The following characters are reported as present in the default macOS // system UI font, but they render as blank. Remove them from the allowed // set to prevent spoofing until the font issue is resolved. // Arabic letter KASHMIRI YEH. Not used in Arabic and Persian. allowed_set.remove(0x0620u); // Tibetan characters used for transliteration of ancient texts: allowed_set.remove(0x0F8Cu); allowed_set.remove(0x0F8Du); allowed_set.remove(0x0F8Eu); allowed_set.remove(0x0F8Fu); #endif |
移除基本不会使用的LGC字符:
1 2 3 4 5 6 7 8 9 |
// Disallow extremely rarely used LGC character blocks. // Cyllic Ext A is not in the allowed set. Neither are Latin Ext-{C,E}. allowed_set.remove(0x01CDu, 0x01DCu); // Latin Ext B; Pinyin allowed_set.remove(0x1C80u, 0x1C8Fu); // Cyrillic Extended-C allowed_set.remove(0x1E00u, 0x1E9Bu); // Latin Extended Additional allowed_set.remove(0x1F00u, 0x1FFFu); // Greek Extended allowed_set.remove(0xA640u, 0xA69Fu); // Cyrillic Extended-B allowed_set.remove(0xA720u, 0xA7FFu); // Latin Extended-D |
allowed_set构建完成
3.
之后将一些特殊字符放在一些集合里
首先处理在IDNA 2003版本和2008版本中定义不同的四个字符,映射U+00DF和U+03C2为b和s,然后移除U+200[cd]因为他们是空白字符
1 2 3 4 5 6 7 |
// Four characters handled differently by IDNA 2003 and IDNA 2008. UTS46 // transitional processing treats them as IDNA 2003 does; maps U+00DF and // U+03C2 and drops U+200[CD]. deviation_characters_ = icu::UnicodeSet( UNICODE_STRING_SIMPLE("[\\u00df\\u03c2\\u200c\\u200d]"), status); deviation_characters_.freeze(); |
将不是ascii的拉丁字符放在一个集合里:
1 2 3 4 5 6 7 |
// Latin letters outside ASCII. 'Script_Extensions=Latin' is not necessary // because additional characters pulled in with scx=Latn are not included in // the allowed set. non_ascii_latin_letters_ = icu::UnicodeSet(UNICODE_STRING_SIMPLE("[[:Latin:] - [a-zA-Z]]"), status); non_ascii_latin_letters_.freeze(); |
将“危险部分放入一个集合,分别是日本的kana字符和组合用字符
1 2 3 4 5 6 7 8 9 |
// The following two sets are parts of |dangerous_patterns_|. kana_letters_exceptions_ = icu::UnicodeSet( UNICODE_STRING_SIMPLE("[\\u3078-\\u307a\\u30d8-\\u30da\\u30fb-\\u30fe]"), status); kana_letters_exceptions_.freeze(); combining_diacritics_exceptions_ = icu::UnicodeSet(UNICODE_STRING_SIMPLE("[\\u0300-\\u0339]"), status); combining_diacritics_exceptions_.freeze(); |
将西里尔字符中与拉丁字符相似的部分放入一个集合:
1 2 3 4 5 6 7 8 9 |
// These Cyrillic letters look like Latin. A domain label entirely made of // these letters is blocked as a simplified whole-script-spoofable. cyrillic_letters_latin_alike_ = icu::UnicodeSet( icu::UnicodeString::fromUTF8("[асԁеһіјӏорԛѕԝхуъЬҽпгѵѡ]"), status); cyrillic_letters_latin_alike_.freeze(); cyrillic_letters_ = icu::UnicodeSet(UNICODE_STRING_SIMPLE("[[:Cyrl:]]"), status); cyrillic_letters_.freeze(); |
接着是一个用来优化的步骤,用来判断是否需要比对top_domains和当前的url,如果存在不在集合内的字符则不需要比对
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// This set is used to determine whether or not to apply a slow // transliteration to remove diacritics to a given hostname before the // confusable skeleton calculation for comparison with top domain names. If // it has any character outside the set, the expensive step will be skipped // because it cannot match any of top domain names. // The last ([\u0300-\u0339] is a shorthand for "[:Identifier_Status=Allowed:] // & [:Script_Extensions=Inherited:] - [\\u200C\\u200D]". The latter is a // subset of the former but it does not matter because hostnames with // characters outside the latter set would be rejected in an earlier step. lgc_letters_n_ascii_ = icu::UnicodeSet( UNICODE_STRING_SIMPLE("[[:Latin:][:Greek:][:Cyrillic:][0-9\\u002e_" "\\u002d][\\u0300-\\u0339]]"), status); lgc_letters_n_ascii_.freeze(); |
接着是变音符号
1 2 3 4 5 6 7 8 9 10 11 |
// Used for diacritics-removal before the skeleton calculation. Add // "ł > l; ø > o; đ > d" that are not handled by "NFD; Nonspacing mark // removal; NFC". // TODO(jshin): Revisit "ł > l; ø > o" mapping. UParseError parse_error; diacritic_remover_.reset(icu::Transliterator::createFromRules( UNICODE_STRING_SIMPLE("DropAcc"), icu::UnicodeString::fromUTF8("::NFD; ::[:Nonspacing Mark:] Remove; ::NFC;" " ł > l; ø > o; đ > d;"), UTRANS_FORWARD, parse_error, status)); |
然后是一个比较重要的步骤,包含了所有其余类似字符的映射集合,大多数研究者提交的unicode spoof都会被放进这个mapper
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
// Supplement the Unicode confusable list by the following mapping. // - {U+00FE (þ), U+03FC (ϼ), U+048F (ҏ)} => p // - {U+0127 (ħ), U+043D (н), U+045B (ћ), U+04A3 (ң), U+04A5 (ҥ), // U+04C8 (ӈ), U+04CA (ӊ), U+050B (ԋ), U+0527 (ԧ), U+0529 (ԩ)} => h // - {U+0138 (ĸ), U+03BA (κ), U+043A (к), U+049B (қ), U+049D (ҝ), // U+049F (ҟ), U+04A1(ҡ), U+04C4 (ӄ), U+051F (ԟ)} => k // - {U+014B (ŋ), U+043F (п)} => n // - {U+0167 (ŧ), U+0442 (т), U+04AD (ҭ), U+050F (ԏ)} => t // - {U+0185 (ƅ), U+044C (ь), U+048D (ҍ), U+0432 (в)} => b // - {U+03C9 (ω), U+0448 (ш), U+0449 (щ), U+0E1F (ฟ)} => w // - {U+043C (м), U+04CE (ӎ)} => m // - {U+0454 (є), U+04BD (ҽ), U+04BF (ҿ), U+1054 (ၔ)} => e // - U+0491 (ґ) => r // - {U+0493 (ғ), U+04FB (ӻ)} => f // - {U+04AB (ҫ), U+1004 (င)} => c // - U+04B1 (ұ) => y // - U+03C7 (χ), U+04B3 (ҳ), U+04FD (ӽ), U+04FF (ӿ) => x // - U+0503 (ԃ) => d // - {U+050D (ԍ), U+100c (ဌ)} => g // - {U+0D1F (ട), U+0E23 (ร)} => s // - U+1042 (၂) => j // - {U+0437 (з), U+04E1 (ӡ)} => 3 extra_confusable_mapper_.reset(icu::Transliterator::createFromRules( UNICODE_STRING_SIMPLE("ExtraConf"), icu::UnicodeString::fromUTF8("[þϼҏ] > p; [ħнћңҥӈӊԋԧԩ] > h;" "[ĸκкқҝҟҡӄԟ] > k; [ŋп] > n; [ŧтҭԏ] > t;" "[ƅьҍв] > b; [ωшщฟ] > w; [мӎ] > m;" "[єҽҿၔ] > e; ґ > r; [ғӻ] > f; [ҫင] > c;" "ұ > y; [χҳӽӿ] > x;" "ԃ > d; [ԍဌ] > g; [ടร] > s; ၂ > j;" "[зӡ] > 3"), UTRANS_FORWARD, parse_error, status)); |
然后idn_spoof的checker构建完成,继续进入url_formatter阶段进行url的格式化,
随后到了真正判断能否以unicode显示一个url的环节:IDNSpoofChecker::SafeToDisplayAsUnicode
首先是这个判断,如果域名中有kana字符则返回false,如果域名全为与拉丁字符相似西里尔字符则直接返回false,如果有变音符则直接返回false,如果这三个判断都为true并且没有字符之间的混合,则直接标记当前输入的url为安全的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
// If there's no script mixing, the input is regarded as safe without any // extra check unless it falls into one of three categories: // - contains Kana letter exceptions // - the TLD is ASCII and the input is made entirely of Cyrillic letters // that look like Latin letters. // - it has combining diacritic marks. // Note that the following combinations of scripts are treated as a 'logical' // single script. // - Chinese: Han, Bopomofo, Common // - Japanese: Han, Hiragana, Katakana, Common // - Korean: Hangul, Han, Common result &= USPOOF_RESTRICTION_LEVEL_MASK; if (result == USPOOF_ASCII) return true; if (result == USPOOF_SINGLE_SCRIPT_RESTRICTIVE && kana_letters_exceptions_.containsNone(label_string) && combining_diacritics_exceptions_.containsNone(label_string)) { // Check Cyrillic confusable only for ASCII TLDs. return !is_tld_ascii || !IsMadeOfLatinAlikeCyrillic(label_string); } |
如果有字符之间的混合,则进入下一个判断,不允许非ascii的拉丁字符与其他非拉丁字符混合
1 2 3 4 5 6 7 8 9 |
// Additional checks for |label| with multiple scripts, one of which is Latin. // Disallow non-ASCII Latin letters to mix with a non-Latin script. // Note that the non-ASCII Latin check should not be applied when the entire // label is made of Latin. Checking with lgc_letters set here should be fine // because script mixing of LGC is already rejected. if (non_ascii_latin_letters_.containsSome(label_string) && !lgc_letters_n_ascii_.containsAll(label_string)) return false; |
最后是前面说的“危险部分”的处理,这里的注释十分详细了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
icu::RegexMatcher* dangerous_pattern = reinterpret_cast<icu::RegexMatcher*>(DangerousPatternTLS().Get()); if (!dangerous_pattern) { // Disallow the katakana no, so, zo, or n, as they may be mistaken for // slashes when they're surrounded by non-Japanese scripts (i.e. scripts // other than Katakana, Hiragana or Han). If {no, so, zo, n} next to a // non-Japanese script on either side is disallowed, legitimate cases like // '{vitamin in Katakana}b6' are blocked. Note that trying to block those // characters when used alone as a label is futile because those cases // would not reach here. // Also disallow what used to be blocked by mixed-script-confusable (MSC) // detection. ICU 58 does not detect MSC any more for a single input string. // See http://bugs.icu-project.org/trac/ticket/12823 . // TODO(jshin): adjust the pattern once the above ICU bug is fixed. // - Disallow U+30FB (Katakana Middle Dot) and U+30FC (Hiragana-Katakana // Prolonged Sound) used out-of-context. // - Dislallow U+30FD/E (Katakana iteration mark/voiced iteration mark) // unless they're preceded by a Katakana. // - Disallow three Hiragana letters (U+307[8-A]) or Katakana letters // (U+30D[8-A]) that look exactly like each other when they're used in a // label otherwise entirely in Katakna or Hiragana. // - Disallow U+0585 (Armenian Small Letter Oh) and U+0581 (Armenian Small // Letter Co) to be next to Latin. // - Disallow combining diacritical mark (U+0300-U+0339) after a non-LGC // character. Other combining diacritical marks are not in the allowed // character set. // - Disallow dotless i (U+0131) followed by a combining mark. // - Disallow U+0307 (dot above) after 'i', 'j', 'l' or dotless i (U+0131). // Dotless j (U+0237) is not in the allowed set to begin with. dangerous_pattern = new icu::RegexMatcher( icu::UnicodeString( R"([^\p{scx=kana}\p{scx=hira}\p{scx=hani}])" R"([\u30ce\u30f3\u30bd\u30be])" R"([^\p{scx=kana}\p{scx=hira}\p{scx=hani}]|)" R"([^\p{scx=kana}\p{scx=hira}]\u30fc|^\u30fc|)" R"([^\p{scx=kana}][\u30fd\u30fe]|^[\u30fd\u30fe]|)" R"(^[\p{scx=kana}]+[\u3078-\u307a][\p{scx=kana}]+$|)" R"(^[\p{scx=hira}]+[\u30d8-\u30da][\p{scx=hira}]+$|)" R"([a-z]\u30fb|\u30fb[a-z]|)" R"([^\p{scx=latn}\p{scx=grek}\p{scx=cyrl}][\u0300-\u0339]|)" R"(\u0131[\u0300-\u0339]|)" R"([ijl]\u0307)", -1, US_INV), 0, status); DangerousPatternTLS().Set(dangerous_pattern); } dangerous_pattern->reset(label_string); return !dangerous_pattern->find(); |
在上面的处理都完成后,会比对当前输入的url和top_domians里的url是否有重合,如果有重合则显示punnycode
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
bool IDNSpoofChecker::SimilarToTopDomains(base::StringPiece16 hostname) { size_t hostname_length = hostname.length() - (hostname.back() == '.' ? 1 : 0); icu::UnicodeString host(FALSE, hostname.data(), hostname_length); // If input has any characters outside Latin-Greek-Cyrillic and [0-9._-], // there is no point in getting rid of diacritics because combining marks // attached to non-LGC characters are already blocked. if (lgc_letters_n_ascii_.span(host, 0, USET_SPAN_CONTAINED) == host.length()) diacritic_remover_.get()->transliterate(host); extra_confusable_mapper_.get()->transliterate(host); UErrorCode status = U_ZERO_ERROR; icu::UnicodeString skeleton; // Map U+04CF (ӏ) to lowercase L in addition to what uspoof_getSkeleton does // (mapping it to lowercase I). int32_t u04cf_pos; if ((u04cf_pos = host.indexOf(0x4CF)) != -1) { icu::UnicodeString host_alt(host); size_t length = host_alt.length(); char16_t* buffer = host_alt.getBuffer(-1); for (char16_t* uc = buffer + u04cf_pos ; uc < buffer + length; ++uc) { if (*uc == 0x4CF) *uc = 0x6C; // Lowercase L } host_alt.releaseBuffer(length); uspoof_getSkeletonUnicodeString(checker_, 0, host_alt, skeleton, &status); if (U_SUCCESS(status) && LookupMatchInTopDomains(skeleton)) return true; } uspoof_getSkeletonUnicodeString(checker_, 0, host, skeleton, &status); return U_SUCCESS(status) && LookupMatchInTopDomains(skeleton); } |
规则带来的安全隐患
经过上面一步步看过来,看起来chrome给出了一个比较完整的解决方案,但同时在阅读的过程中,其实已经可以发现一些可能出现或是已经出现的unicode漏洞类型了,我这里给出现过的unicode spoof漏洞分一个类,并在后面给出几个案例以便说的更清楚
- 常规的对[a-z]的spoof,即找到与之相似的字符进行的欺骗,而其中又有whole-script spoofing和混合字符的spoof,最经典的就是之前提到的,全由西里尔字符组成的аррӏе.com,可以看https://bugs.chromium.org/p/chromium/issues/detail?id=683314
- 组合字符带来的欺骗,主要是组合字符与前一个字符配合组成新的字形导致的spoof,例如这一个例子:https://bugs.chromium.org/p/chromium/issues/detail?id=750239,是由
i j ı
配合组合字符中的上位点U+0307造成的欺骗 - 特殊的ascii字符的欺骗,例如-的spoof以及/的spoof,之前说的在mac下渲染为空白的字符,可能还会出现.的spoof和@的spoof
- RTL字符造成的欺骗
- 顶级域名的欺骗,这个比较特殊,来源于我之前提交的一个unicode spoof,其实不能算是一个正规的顶级域名,是co.com这个伪顶级域名的spoof,参见http://registry.co.com/ ,这个网站旨在让一些无法在.com上注册的域名注册.co.com来代替,例如我想注册qq.com,但是无法注册因为这已经被注册了,那我可以退而求其次去注册qq.co.com,这也造成了如果有一个对co这两个字母的spoof,就可以造成对.co.com下的任意子域的钓鱼攻击,包括其注册网站http://registry.co.com/,我提交的是缅甸字符的
ငဝ
,但是这个漏洞并不被chrome承认,因为他们还没有对这类“顶级域名”的欺骗的处理机制,并且co.com不在其设置的top_domains中,所以直到现在ငဝ.com
都没有被block,当时这个洞是因为其余的一些的不那么相似的缅甸字符拿到了bounty,个人认为有一种钦定的感觉,但是我还是觉得co.com应该加入top_domains里,在我写这篇文章的时候,我发现registry.ငဝ.com
这一个域名已经出现了,是一个印度人注册的
思路
在chrome对unicode进行过滤的很多个步骤之中,每一步都有其理由,但同时每一步也明确指出了可能的攻击思路和欺骗方法,接下来我们再按着上面调试走过的流程,一步一步看chrome的规则中每一步会带来什么攻击思路
第一个步骤,允许ascii字符与一个CJK字符混合,CJK字符我们都很熟悉,中日韩三国的文字简直和ascii天差地别,但是这里仍然会有问题,例如-的spoof以及组合字符的spoof
第二个步骤,移除与- / · =相似的字符,这就明明白白的告诉了你,你可以去找其余的,与- /这些字符相似的字符
第三个步骤,移除在mac渲染为空白的字符,有没有在Windows, Linux, iOS, Android上渲染为空白的字符呢?
……
剩下的不写了2333 写不动了
referer
- http://www.unicode.org/versions/Unicode11.0.0/UnicodeStandard-11.0.pdf
- http://www.unicode.org/reports/tr36/
- http://www.unicode.org/reports/tr39/
- https://www.unicode.org/faq/security.html
- https://www.unicode.org/reports/tr46/
- https://www.unicode.org/faq/idn.html
- https://unicode-table.com/cn/#control-character
- http://unicode.org/cldr/utility/confusables.jsp?a=a&r=None