[心得][.NET] GetHashCode()

时间Tue Nov 22 22:34:31 2016

※ [本文转录自 C_Sharp 看板 #1OD5PWGf ] 作者: AmosYang (泛用人型编码器) 看板: C_Sharp 标题: [心得] GetHashCode() 时间: Tue Nov 22 22:28:12 2016 我写了一篇文章，对 GetHashCode() 这个方法有兴趣者，可以看看 * HTML: http://www.30abysses.com/TWY/2016/11/21/c_sharp-gethashcode.html * 文字: http://www.30abysses.com/TWY/2016/11/21/c_sharp-gethashcode.md * 前 1/3 「GetHashCode() 简单来说就是……」是偏向於初学者 * 中 1/3 「最古の四人， GetHashCode()」、「GetHashCode() 守则(guideline)与法则(rule)」是比较进阶的内容，整理自官方文件与 Eric Lippert 的文章 * 後 1/3 「探索 `GetHashCode()` 原始码」是直接连结到 github.com 上 coreclr 的原始码，给「觉得看程式码比看文字来得轻松」者看的 :D HTML 版的 CSS 我还要再调整调整，目前在 Chrome, IE, Edge 三家浏览器上看有相当微妙的差异 orz ；如果太伤眼的话，先看文字版或 PTT web 版吧 :D https://webptt.com/cn.aspx?n=bbs/C_Sharp/M.1479824992.A.429.html ======================================================================== > http://www.30abysses.com/TWY/2016/11/21/c_sharp-gethashcode.md > by TW Yang <[email protected]> 2016-11-21 CC-BY-4.0 # C# GetHashCode() [`GetHashCode()`][1] 这个方法(method)有着《魔偶马戏团》中「[四大元老][2]」（[最古の四人][3]）的地位，直属 [`System.Object`][4] 类别(class) ，栖身於 `mscorlib` 此组件(assembly)中。网路上是可以找到些关於 `GetHashCode()` 的零碎讨论，但这篇文章将从最有权威性(authoritative) 的资料来源重新出发，再逐步探讨细节。 [1]: https://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx [2]: https://zh.wikipedia.org/zh-tw/%E5%82%80%E5%84%A1%E9%A6%AC%E6%88%B2%E5%9C%98%E8%A7%92%E8%89%B2%E5%88%97%E8%A1%A8#.E5.9B.9B.E5.A4.A7.E5.85.83.E8.80.81 [3]: https://ja.wikipedia.org/wiki/%E3%81%8B%E3%82%89%E3%81%8F%E3%82%8A%E3%82%B5%E3%83%BC%E3%82%AB%E3%82%B9%E3%81%AE%E7%99%BB%E5%A0%B4%E4%BA%BA%E7%89%A9#.E6.9C.80.E5.8F.A4.E3.81.AE.E5.9B.9B.E4.BA.BA.EF.BC.88.E3.83.AC.E3.83.BB.E3.82.AD.E3.83.A3.E3.83.88.E3.83.AB.E3.83.BB.E3.83.94.E3.82.AA.E3.83.8D.E3.83.BC.E3.83.AB.EF.BC.89 [4]: https://msdn.microsoft.com/en-us/library/system.object.aspx ## 资料来源 * [MSDN `GetHashCode()` 官方文件][1] （[官方机器翻译版本][5]） * [`coreclr` 原始码(source code)][6] * "[Guidelines and rules for GetHashCode][7]" by _le_ **Eric Lippert** [5]: https://msdn.microsoft.com/zh-tw/library/system.object.gethashcode.aspx [6]: https://github.com/dotnet/coreclr [7]: https://blogs.msdn.microsoft.com/ericlippert/2011/02/28/guidelines-and-rules-for-gethashcode/ 微软(Microsoft)除了开源 `.NET Core` ，目前也有释出其他版本的 `.NET Framework` 参考用原始码如下。 * https://github.com/Microsoft/referencesource/ * https://referencesource.microsoft.com/ 经过大略的比对後，就 `GetHashCode()` 看起来是几乎完全相同的；在此以 `coreclr` 版本为主。 # `GetHashCode()` 简单来说就是…… 在试着理解 `GetHashCode()` 的功能前，得先理解它要解决的问题： > 比较两个物件(object)是否「相同」更完整的说法是： > 有一物件甲，有一物件乙，试问：「甲、乙是否『相同』？」所谓「相同」在此是个很抽象的观念，在不同的场合有不同的解释。例如，假设甲、乙是「书」，那麽通常来说，如果它们的 [国际标准书号][8]([ISBN][9])相同，那我们就可以说甲、乙是两本书是「相同的书」。相对的，如果甲、乙这两本书的国际标准书号不同，我们就会说甲、乙是「不同的书」。 [8]: https://zh.wikipedia.org/zh-tw/%E5%9C%8B%E9%9A%9B%E6%A8%99%E6%BA%96%E6%9B%B8%E8%99%9F [9]: https://en.wikipedia.org/wiki/International_Standard_Book_Number 然而，视场合不同，对「相同」在定义上严谨度的要求也不同；例如，就算甲、乙两本书有一样的国际标准书号，其内容仍有可能相异（书有可能缺页、被涂改）。如果是在监定书籍，那就可能要逐字逐页考证；可以想像，那将是很费功夫的事。易言之： * 如果两本书的国际标准书号不同，那几乎可以 100% 安全地决定「完全不需要花功夫去逐字逐页比对」。 * 如果两本书的国际标准书号相同，那就必须进一步考量「是否需要花功夫去逐字逐页比对」。再进一步说： * 即使两本书的国际标准书号相同，最後比对的结果仍可能发现这两本书并不 100% 相同（缺页、被涂改）。反过来说： * 两本事实上并不 100% 相同的书（缺页、被涂改），仍有可能有相同的国际标准书号。 ## 所以说，那个酱汁^H^H `GetHashCode()` 就是…… 在 C# 程式的领域中， `GetHashCode()` 的作用就像上述例子中的「国际标准书号」，提供一个快速筛选物件的作法；从其签章(signature) ``` public virtual int GetHashCode() ``` 可以看出，它的设计是传回一个 `int` 来代表该物件；而 `int` 的好处之一，就是可以快速简单地比较出异同。易言之，与上述的「书、国际标准书号」的例子对应起来，就会是这个样子： * 如果两个物件由 `GetHashCode()` 传回值不同，那就不需要进一步比对。 * 如果两个物件由 `GetHashCode()` 传回值相同，那就有需要进一步比对。 * 即使两个物件由 `GetHashCode()` 传回值相同，最後比对的结果仍可能发现这两个物件并不 100% 相同。 * 两个事实上不 100% 相同的物件，其 `GetHashCode()` 传回值仍有可能相同。对应到[官方文件][1] 的话，就是这段： > Two objects that are equal return hash codes that are equal. However, > the reverse is not true: equal hash codes do not imply object > equality, because different (unequal) objects can have identical hash > codes. [官方机器翻译][5] ： > 等於传回杂凑程式码是相等的两个物件。不过，反向并不成立︰相等的杂凑程 > 式码不一定代表物件是否相等，因为不同（相等）的物件可以有相同的杂凑码 > 。我的手动释译： > 相同的物件传回相同的杂凑(hash)值。然而，这不代表「传回相同杂凑值的都是 > 相同的物件」；因为相异的物件仍可能传回相同的杂凑值。用逻辑来说就是无法从「Ｐ→Ｑ」得出「Ｑ→Ｐ」；也就是 **无法** 从 > Ｐ(相同／相等物件) → Ｑ(传回相同杂凑值) 得到 > Ｑ(传回相同杂凑值) → Ｐ(相同／相等物件) ## `GetHashCode()` 如何知道该怎麽判断／计算其传回值？简单地说：「施主，这个问题你应该要问你自己。」 `GetHashCode()` 只有定义其传回值的规则，真正的实作是取决於使用 `GetHashCode()` 这个架构的人。反过来说，制定 `GetHashCode()` 架构的人无法事先预知在各种不同场合中对各种不同物件种类判定异同的规则，是故，他只能在抽象层面上定义传回值代表的意义以及说明 .NET Framework 会如何使用 `GetHashCode()` 的传回值。通常来说，除非你遇到要处理相等性(equality)及其沿生出的东西，例如 `Dictionary`, `HashSet`, `Hashtable`, **`LINQ`**, 不然，通常来说并不用特别烦恼 `GetHashCode()` 的问题；常用的基本资料型态，例如 `int`, `string`, `double`, 等等，都已有内建的 `GetHashCode()` ，应该能应付一般的使用情形。 ## 为什麽「相异的物件仍可能传回相同的杂凑值」？简单地说，[鸽巢原理][10]([Pigeonhole principle][11])。 [10]: https://zh.wikipedia.org/zh-tw/%E9%B4%BF%E5%B7%A2%E5%8E%9F%E7%90%86 [11]: https://en.wikipedia.org/wiki/Pigeonhole_principle 思考一下： > 在十进位系统里，若只使用一位数，能表示出最多几种独特(unique)的编号？答案是 10 种，也就是 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 。进一步思考： > 在十进位系统里，若只使用一位数为 11 个物件编号，是否有可能让每个物件都 > 有其独一无二的编号？答案是不可能。因为为前 10 个物件编号时，就会用完仅有的 10 个独特编号，而最後一个物件就 **必须** 挑一个数字重覆使用。这就是[鸽巢原理][10]说的： > 若有n个笼子和n+1只鸽子，所有的鸽子都被关在鸽笼里，那麽至少有一个笼子有 > 至少2只鸽子。回头来看 `GetHashCode()` ，其传回的 `int` 是一个 32 位元的数字，最多只能表示大约 42.9 亿种独特的状态，也就是 -2147483648, -2147483647, -2147483646, ... -1, 0, 1, ... 2147483645, 2147483646, 2147483647 这 4294967295 种状态。如果有一种物件具有超过 42.9 亿种独特的状态，那麽在为前 42.9 亿种独特状态编号之後，就会把每个 `int` 能代表的数字都会用过一次，接下来，在数学逻辑上无法避免的，必须重覆使用之前用过的数字来编号，也就是所谓的 [碰撞][12]([collision][13]) 。 [12]: https://zh.wikipedia.org/zh-tw/%E7%A2%B0%E6%92%9E_(%E8%A8%88%E7%AE%97%E6%A9%9F%E7%A7%91%E5%AD%B8) [13]: https://en.wikipedia.org/wiki/Collision_(computer_science) ### 超过 42.9 亿种独特状态的物件真的存在吗？ `long`, 64 位元的数字，可以表示大约 18.4 艾种状态；一艾是 10 的 18 次方，也就是是一亿（10 的 8 次方）的一百亿倍（10 的 10 次方）。参考资料与官方机器翻译： * `int` * https://msdn.microsoft.com/en-us/library/5kzh1b5w.aspx * https://msdn.microsoft.com/zh-tw/library/5kzh1b5w.aspx * `long` * https://msdn.microsoft.com/en-us/library/ctetwysk.aspx * https://msdn.microsoft.com/zh-tw/library/ctetwysk.aspx # [最古の四人][3]， `GetHashCode()` _le_ **Eric Lippert** 在他的文章 "[Guidelines and rules for GetHashCode][7]" 中提出了很有趣的问题： > Why do we have this method on Object in the first place? > > 究竟为什麽这个方法(`GetHashCode()`)是宣告在 `Object` 上？ Eric Lippert 检视其他宣告在 [`System.Object`][4] 上的方法： > It makes perfect sense that every object in the type system should > provide a GetType method; data’s ability to describe itself is a key > feature of the CLR type system. And it makes sense that every object > should have a ToString, so that it is able to print out a > representation of itself as a string, for debugging purposes. It seems > plausible that objects should be able to compare themselves to other > objects for equality. But why should it be the case that every object > should be able to hash itself for insertion into a hash table? Seems > like an odd thing to require every object to be able to do. 摘要释译如下： * `GetType()`: 「资料『自我描述』的能力」是 CLR 类别系统的关键特点之一 * `ToString()`: 为了除错，提供以「字串」印出自己的能力 * `Equals()`: 感觉上物件应要能与其它物件比较「相等性」 * `GetHashCode()`: **但是** ，为什麽强制要求所有的物件都要能计算杂凑值？ Eric Lippert 的感想： > I think if we were redesigning the type system from scratch today, > hashing might be done differently, perhaps with an IHashable > interface. But when the CLR type system was designed there were no > generic types and therefore a general-purpose hash table needed to be > able to store any object. 摘要释译如下： > 我想，如果我们今天从头重新设计类别系统，大概会用不同方法来实作杂凑；或 > 许会用个 `IHashable` 介面。但是，当初设计 CLR 类别系统时尚无泛型类别 > ，是故一个通用的杂凑表需要能储存任何物件。我对 Eric Lippert 这两段文字的解读是：类似 [`System.IComparable`][14] ， **或许** `GetHashCode()` 应该属於像这样的一个介面 ``` interface IHashable { int GetHashCode(); } ``` **或许** 那样作会比较乾净(clean) 与优雅(elegant) 。 [14]: https://msdn.microsoft.com/en-us/library/system.icomparable.aspx 然而，就我自己对 C# / .NET 的感觉是： > 我觉得把 `GetHashCode()` 宣告在 `System.Object` 上，且提供了一个在通 > 常情形下堪用的预设实作，是在理论与实务上取得了 **适当的平衡** 。所谓「理论与实务上的适当的平衡」的实例之一，就是「读写档案」。在我印像中的 Java, 要读写档案就得要把一堆有的没的五四三 stream, reader, writer 组合起来；但在 .NET / C#, 读写(小)档案时， [`System.IO.File`][15] 已经把最常见的动作通通包装好，并且後来更进一步加入类似 streaming 的支援；但若想更精确地控制读写档案的缓冲(buffer)等行为，还是可以回头去组装 stream, reader, writer 以达成目的。易言之，就是在实务上作到 * 让常用的功能「方便(convenient)」。 * 让复杂的动作「可能(possible)」。而不是过度偏重理论上的乾净与优雅。 [15]: https://msdn.microsoft.com/en-us/library/system.io.file.aspx # `GetHashCode()` 守则(guideline)与法则(rule) 基本上，[MSDN `GetHashCode()` 官方文件][1] 已提供了相当的资讯，但其机器翻译版本实在惨不忍睹，甚至有错…… `orz` 是故，以下摘要释译之。同时， [Eric Lippert 的文章][7]也值得细读。有些地方会同时作些小实验，测试环境如下： * 64-bit W10 Pro 1607 * VS2015 Update 3 ## Ｐ(相同／相等物件) → Ｑ(传回相同杂凑值) 如果相同／相等物件传回不同杂凑值，那代表着 `GetHashCode()` 或 `Equals()` 里有臭虫。 ## Ｑ(传回相同杂凑值) →╳→ Ｐ(相同／相等物件) ``` Console.WriteLine(1L.GetHashCode()); Console.WriteLine(4294967296L.GetHashCode()); ``` 很明显的， `1L` 与 `4294967296L` 是不同／相异的物件，但两者都会传回相同的杂凑值: `1` 。 ## 不要将杂凑值储存於外部 > `GetHashCode()` 传回值应只在同 process 下的同一 application domain 内 > 使用；不同 .NET Framework 间的 `GetHashCode()` 实作有可能改变测试方法： ``` Console.WriteLine("Hello, world!".GetHashCode()); ``` 测试结果： * .NET 2.0, 3.0, 3.5, 4.0 传回 307044849 * .NET 4.5.*, 4.6.* 传回 904533705 * .NET Core 1.0.1 传回 -1650644819 ## 不要将 `GetHashCode()` 产生的杂凑值用在密码学(cryptography)用途上简单地说，「闪开！让专业的来！」 * https://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithcn.aspx * https://msdn.microsoft.com/en-us/library/system.security.cryptography.keyedhashalgorithcn.aspx ## （只要物件本身没改变） `GetHashCode()` 应该要有一致性只要物件本身没有足以影响 `Equals()` 结果的变化，那 `GetHashCode()` 产生的杂凑也就不应变化。（但这只限於同一 process 下同一 application domain ）。尤其是当此物件被存放在杂凑表这类「重度依赖杂凑值以确保其运作正确性」的资料结构中时，若 `GetHashCode()` 无法传回一致的杂凑值，可以想见这会造成各种混乱。 ## `GetHashCode()` 的计算需求要少、要快 `GetHashCode()` 本来的目的就是协助、加快比较物件之间的相等性；如果 `GetHashCode()` 要花大量时间、资源运算，那反而本末倒置了。 ## `GetHashCode()` 不应丢出例外(exception) 官方原文： > The GetHashCode method should not throw exceptions. 官方机器翻译： > GetHashCode 方法应该掷回例外状况。 ## `GetHashCode()` 产生的杂凑值应要平均分布愈能平均分布，通常愈能避免碰撞。同时，要考量到输入资料的性质； Eric Lippert 有提到个案例：「[美国邮递区号][16]」，通通都是五个数字字元；而他当时写的杂凑演算法无法有效地就这样的资料产生平均分布的杂凑值（也就是产生了大量的碰撞问题），最後严重拖累系统效能。另外一点是安全上的考量；如果杂凑值演算法中有来自使用者的输入资料，那麽在理论上就有可能让恶意使用者有机可趁，想办法产生大量的碰撞，拖累你的系统效能。 [16]: https://simple.wikipedia.org/wiki/ZIP_code # 探索 `GetHashCode()` 原始码这个章节摘录一些内建的 `GetHashCode()` 实作，作为参考。篇幅较大的原始码（尤其是 C++ 原始码），只有附上连结，就不再转录。以下这个连结可以列出 `System` 下所有覆写(override) `GetHashCode()` 的类别： https://github.com/dotnet/coreclr/search?utf8=%E2%9C%93&q=%22public+override+int+GetHashCode%22+extension%3Acs+path%3A%2Fsrc%2Fmscorlib%2Fsrc%2FSystem ## `System.Object` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Object.cs#L83-L95 ``` // GetHashCode is intended to serve as a hash function for this object. // Based on the contents of the object, the hash function will return a suitable // value with a relatively random distribution over the various inputs. // // The default implementation returns the sync block index for this instance. // Calling it on the same object multiple times will return the same value, so // it will technically meet the needs of a hash function, but it's less than ideal. // Objects (& especially value classes) should override this method. // public virtual int GetHashCode() { return RuntimeHelpers.GetHashCode(this); } ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs#L169-L171 ``` [System.Security.SecuritySafeCritical] // auto-generated [MethodImplAttribute(MethodImplOptions.InternalCall)] public static extern int GetHashCode(Object o); ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/ecalllist.h#L2354 ``` FCClassElement("RuntimeHelpers", "System.Runtime.CompilerServices", gCompilerFuncs) ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/ecalllist.h#L1855 ``` FCFuncElement("GetHashCode", ObjectNative::GetHashCode) ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/objectnative.cpp https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/objectnative.cpp#L101-L148 ``` static FCDECL1(INT32, GetHashCode, Object* vThisRef); ``` 最後来到 https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/object.h#L472 https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/object.cpp#L75-L155 ``` INT32 GetHashCodeEx(); ``` ## `System.Int32` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Int32.cs#L76-L78 ``` public override int GetHashCode() { return m_value; } ``` ## `System.UInt32` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/UInt32.cs#L76-L79 ``` // The absolute value of the int contained. public override int GetHashCode() { return ((int) m_value); } ``` ## `System.Int64` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Int64.cs#L75-L77 ``` public override int GetHashCode() { return (unchecked((int)((long)m_value)) ^ (int)(m_value >> 32)); } ``` ## `System.UInt64` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/UInt64.cs#L73-L76 ``` // The value of the lower 32 bits XORed with the uppper 32 bits. public override int GetHashCode() { return ((int)m_value) ^ (int)(m_value >> 32); } ``` ## `System.Double` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Double.cs#L190-L199 ``` [System.Security.SecuritySafeCritical] public unsafe override int GetHashCode() { double d = m_value; if (d == 0) { // Ensure that 0 and -0 have the same hash code return 0; } long value = *(long*)(&d); return unchecked((int)value) ^ ((int)(value >> 32)); } ``` ## `System.Decimal` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Decimal.cs#L441-L445 ``` // Returns the hash code for this Decimal. // [System.Security.SecuritySafeCritical] // auto-generated [MethodImplAttribute(MethodImplOptions.InternalCall)] public extern override int GetHashCode(); ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/decimal.h#L26 https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/decimal.cpp#L102-L126 ``` static FCDECL1(INT32, GetHashCode, DECIMAL *d); ``` ## `System.String` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/String.Comparison.cs#L999-L1013 ``` // Gets a hash code for this string. If strings A and B are such that A.Equals(B), then // they will return the same hash code. [System.Security.SecuritySafeCritical] // auto-generated [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] public override int GetHashCode() { #if FEATURE_RANDOMIZED_STRING_HASHING if (HashHelpers.s_UseRandomizedStringHashing) { return InternalMarvin32HashString(this, this.Length, 0); } #endif // FEATURE_RANDOMIZED_STRING_HASHING return GetLegacyNonRandomizedHashCode(); } ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/String.Comparison.cs#L1015-L1069 ``` // Use this if and only if you need the hashcode to not change across app domains (e.g. you have an app domain agile // hash table). [System.Security.SecuritySafeCritical] // auto-generated [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] internal int GetLegacyNonRandomizedHashCode() { unsafe { fixed (char* src = &m_firstChar) { Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'"); Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary"); #if BIT64 int hash1 = 5381; #else // !BIT64 (32) int hash1 = (5381<<16) + 5381; #endif int hash2 = hash1; #if BIT64 int c; char *s = src; while ((c = s[0]) != 0) { hash1 = ((hash1 << 5) + hash1) ^ c; c = s[1]; if (c == 0) break; hash2 = ((hash2 << 5) + hash2) ^ c; s += 2; } #else // !BIT64 (32) // 32 bit machines. int* pint = (int *)src; int len = this.Length; while (len > 2) { hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0]; hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1]; pint += 2; len -= 4; } if (len > 0) { hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0]; } #endif #if DEBUG // We want to ensure we can change our hash function daily. // This is perfectly fine as long as you don't persist the // value from GetHashCode to disk or count on String A // hashing before string B. Those are bugs in your code. hash1 ^= ThisAssembly.DailyBuildNumber; #endif return hash1 + (hash2 * 1566083941); } } } ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/String.Comparison.cs#L982-L986 ``` // Do not remove! // This method is called by reflection in System.Xml [System.Security.SecurityCritical] [MethodImplAttribute(MethodImplOptions.InternalCall)] internal static extern int InternalMarvin32HashString(string s, int strLen, long additionalEntropy); ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/ecalllist.h#L227 ``` FCFuncElement("InternalMarvin32HashString", COMString::Marvin32HashString) ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/stringnative.h#L89 https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/classlibnative/bcltype/stringnative.cpp#L171-L187 ``` static FCDECL3(INT32, Marvin32HashString, StringObject* thisRefUNSAFE, INT32 strLen, INT64 additionalEntropy); ``` ## `System.Single` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/Single.cs#L166-L175 ``` [System.Security.SecuritySafeCritical] // auto-generated public unsafe override int GetHashCode() { float f = m_value; if (f == 0) { // Ensure that 0 and -0 have the same hash code return 0; } int v = *(int*)(&f); return v; } ``` ## `System.ValueType` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/mscorlib/src/System/ValueType.cs#L71-L83 ``` /*=================================GetHashCode================================== **Action: Our algorithm for returning the hashcode is a little bit complex. We look ** for the first non-static field and get it's hashcode. If the type has no ** non-static fields, we return the hashcode of the type. We can't take the ** hashcode of a static member because if that member is of the same type as ** the original type, we'll end up in an infinite loop. **Returns: The hashcode for the type. **Arguments: None. **Exceptions: None. ==============================================================================*/ [System.Security.SecuritySafeCritical] // auto-generated [MethodImplAttribute(MethodImplOptions.InternalCall)] public extern override int GetHashCode(); ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/ecalllist.h#L240 ``` FCFuncElement("GetHashCode", ValueTypeHelper::GetHashCode) ``` https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/comutilnative.h#L248 https://github.com/dotnet/coreclr/blob/b78b71f220ccb28eb6a5f9ea903536bdb6cd3f3d/src/vm/comutilnative.cpp#L2778-L2823 ``` static FCDECL1(INT32, GetHashCode, Object* objRef); ``` --

※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 70.181.102.71 ※ 文章网址: https://webptt.com/cn.aspx?n=bbs/C_Sharp/M.1479824992.A.429.html ※ 编辑: AmosYang (70.181.102.71), 11/22/2016 22:32:57

※ 发信站: 批踢踢实业坊(ptt.cc) ※ 转录者: AmosYang (70.181.102.71), 11/22/2016 22:34:31 之所以会写这篇，是因为与人聊到求职时「怎麽样才算上得了台面的作品？」我的主张是「看起来再简单的题目，事实上都是有深入钻研的空间。以新人来说，与其沾酱油式的东拼西凑样样通样样松，不如把一个小题目的来龙去脉搞清楚，打造『学习能力／热情』的铁证，而不是被动等别人来赏识自己。」是故，就挑了 .NET 里 System.Object.GetHashCode() 这个题目，以实作验证上述的主张。搞懂整个题目花了大约四小时，大部分时间是在摸索 coreclr 的结构，写文章花了大约十小时。待求职面试谈话对方作球给你杀时，这些吸收再反刍过的知识，就是嘴炮的弹药；讲个 20+ 分钟直接 combo OTK 应该不是问题 :D ※ 编辑: AmosYang (70.181.102.71), 11/22/2016 23:01:02

1^F：推 cutekid: 超强大 11/22 22:52

2^F：推 robler: 一般人要有需要比较物件时通常会试者实作IComparable介面 11/22 22:59

3^F：→ robler: 因为GetHashCode很难用来实际比较物件XD 11/22 23:00

是的, GetHashCode() 是协助比较物件「相同、相异」，主要应用在与 hash 有关的地方 IComparable 则是除了比较相同、相异之外，更进一步有「顺序(order)」的意味；更适合应用在「排序」上。易言之，两者有微妙的差异。 :) ※ 编辑: AmosYang (70.181.102.71), 11/22/2016 23:06:22

4^F：推 aoksc: 先推再说 11/22 23:15

5^F：推 TSW: 虽然我觉得没什麽使用上的必要，不过还是感谢分享心得。 11/23 01:30

是的，「使用上的必要」是看身处在整个产业链中哪个环节；这类知识在上游比较有用，或着在资料数量 n 够大时才比较有用。

6^F：推 erspicu: 比较好奇的是取得物件的HASH 能够应用在哪些问题上? 11/23 02:22

「能够应用在哪些问题上?」就是文中所提及的「需要测试物件『相等性(equality)』的场合， GetHashCode() 就 *有可能* 派上用场」就逻辑上来说，这并 **没有** GetHashCode() ＝测试物件相等性(equality)的终极解决方案的意思。也就是 GetHashCode() ≠ 测试物件相等性(equality)的终极解决方案易言之，更完整的说法为 GetHashCode() 这样的设计虽然增加了整体复杂度，但它开拓了一块空间，提供改善「测试物件相等性(equality)」的效能的机会与可能性例如，在 System.Collections.Generic.Dictionary<TKey,TValue> 的原始码中， https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs 就可以看到许多使用 GetHashCode() 的场合若仔细去追 .NET core 的原始码，会发现 hash, hash table （以及衍生出的 set, hash set, dictionary, ...) 通通都依赖适当的 caching, 适当的 hash 演算法选择，才能作到那些资料结构宣称的各种 big-O 时间界限(boundary)；而这些资料结构更是被 .NET Framework 其他部分, 例如, LINQ, 使用是故，身为 System.Object 上最古老的四人之一, GetHashCode(), 被所有类别继承着 (*1) ；若对软体业产业上游, framework design, 这方向有兴趣的话，这里面是有极多用血汗泪堆屍堆出来的知识可挖的 :D *1: 技术上来说, 在 Windows Runtime 的领域里，这句话不为真…但那是另一个话题了 :D ※ 编辑: AmosYang (70.181.102.71), 11/23/2016 05:36:36 更简单地说，如果有一天，你在处理某个问题时，资料数量 n 大到让你开始注意到官方文件 (如下) 不是说 Dictionary 的 this[key] 应该是 (amortized) O(1) ？为什麽我用起来感觉不是 (amortized) O(1) > Getting or setting the value of this property approaches an O(1) operation 配合 Dictionary 的原始码 https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs#L326-L338 再配合上面原文里的官方文件与 Eric Lippert 的文章，你就会知道该去检视你放进 Dictionary 的物件的 GetHashCode() 与 Equals() 是不是出包了 :D ※ 编辑: AmosYang (70.181.102.71), 11/23/2016 05:50:04

7^F：推 ichico: 只能跪着推了 11/23 09:24

8^F：推 kalaja: 认真推 11/23 10:25

9^F：→ anguso: blizzard用 C#? 11/23 12:26

如果 C# / .NET 是适当的工具，就用。

10^F：推 sing10407: 推推 11/23 13:16

※ 编辑: AmosYang (70.181.102.71), 11/23/2016 15:13:26

11^F：推 petercoin: 跪着推但是393行好像有乱码? 11/23 15:35

大概是 PTT 这边的问题，无法正确处理所有的 unicode... 在 UTF8 编码的 .html 与 .md 里看起来是正常的 :) * http://www.30abysses.com/TWY/2016/11/21/c_sharp-gethashcode.html * http://www.30abysses.com/TWY/2016/11/21/c_sharp-gethashcode.md ※ 编辑: AmosYang (70.181.102.71), 11/23/2016 16:01:34

12^F：→ TSW: 「资料数量 n 够大时」应该会有十分完整的 index 跟 compare 11/23 17:26

13^F：→ TSW: 机制，感觉更用不到的说@@ 11/23 17:26

同意，「 n 够大」可以有很多解释；例如，大到单一机器(的记忆体)「装不下」，也是一种大。大到一般人只能在抽象层面理解的数量级，也是一种大，等等。然而，所谓「完整的 index 跟 compare 机制」，中的「 compare 机制」，即使不是就真的去覆写(override) GetHashCode() 这个方法(method), 我很难想像一个「用不到类似 hashing 相关作法的、为了处理大量资料的 compare 机制」当然，是有可能因为 indexing 作得很棒，以致於所有需要 compare 的场合通通不需要更进一步的优化，这个理论上的确是可能的；但在实务上，我觉得很难实现，尤其是「一次到位」就作好。中间应该会有段过渡期，是连系统、流程本身的规格都会继续演化，然後无可避免的，一些 local **temporary** optimization 最後就为成为这系统无法动摇的一部分 XD 就如下面这句话说的: > "Nothing is more permanent than a temporary solution." > 没有比「暂时性方案」更恒远的事物了。我 100% 同意以下这句话视情况，有比『从 GetHashCode() 着手』更好的解决方案」。但实务上，大概还是取决於天时地利人和 :D ※ 编辑: AmosYang (70.181.102.71), 11/24/2016 02:05:20 补充 > 实务上，我觉得很难实现，尤其是「一次到位」就作好例如, ｎ本身的数量级会演化，对於「ｎ有多大」的解释也会演化，值不值得去作 indexing 的评估也会演化；通常是以三年、五年、十年为单位在演化。是故，实务上很难一次到位，除非作的东西够小 :D ※ 编辑: AmosYang (70.181.102.71), 11/24/2016 02:16:21

	[问题/行为] 猫晚上进房间会不会有憋尿问题
	Re: [闲聊] 选了错误的女孩成为魔法少女 XDDDDDDDDDD
	[正妹] 瑞典一张
	[心得] EMS高领长版毛衣.墨小楼MC1002
	[分享] 丹龙隔热纸GE55+33+22
	[问题] 清洗洗衣机
	[寻物] 窗台下的空间
	[闲聊] 双极の女神1 木魔爵
	[售车] 新竹 1997 march 1297cc 白色四门
	[讨论] 能从照片感受到摄影者心情吗
	[狂贺] 贺贺贺贺贺！岛村卯月！总选举NO.1
	[难过] 羡慕白皮肤的女生
	阅读文章
	[黑特]
	[问题] SBK S1安装於安全帽位置
	[分享] 旧woo100绝版开箱!!
	Re: [无言] 关於小包卫生纸
	[开箱] E5-2683V3 RX480Strix 快睿C1 简单测试
	[心得] 苍の海贼龙地狱执行者16PT
	[售车] 1999年Virage iO 1.8EXi
	[心得] 挑战33 LV10 狮子座pt solo
	[闲聊] 手把手教你不被桶之新手主购教学
	[分享] Civic Type R 量产版官方照无预警流出
	[售车] Golf 4 2.0 银色自排
	[出售] Graco提篮汽座（有底座）2000元诚可议
	[问题] 请问补牙材质掉了还能再补吗?(台中半年内
	[问题] 44th 单曲生写竟然都给重复的啊啊！
	[心得] 华南红卡/icash 核卡
	[问题] 拔牙矫正这样正常吗
	[赠送] 老莫高业初业 102年版
	[情报] 三大行动支付本季掀战火
	[宝宝] 博客来Amos水蜡笔5/1特价五折
	Re: [心得] 新鲜人一些面试分享
	[心得] 苍の海贼龙地狱麒麟25PT
	Re: [闲聊] (君の名は。雷慎入) 君名二创漫画翻译
	Re: [闲聊] OGN中场影片：失踪人口局 (英文字幕)
	[问题] 台湾大哥大4G讯号差
	[出售] [全国]全新千寻侘草LED灯, 水草

WEB批踢踢(PTT)

Soft_Job 板

[心得][.NET] GetHashCode()

热门看板

赞助商连结