看板Oversea_Job
标 题请教一些面试问题
发信站批踢踢参 (Thu Aug 23 14:00:59 2007)
转信站ptt!Group.NCTU!grouppost!Group.NCTU!ptt3
第一道题:
Given N computers networked together, with each computer storing N integers,
finds the "median" of all of the numbers. Assuming a computer can hold O(N)
integers. Also assume that there exists a computer on the network without
integers, that we can use to interface with the other computers storing the
integers.
以下是我想的解法:
1. Sort each of piles of numbers separately in an array -
题目有讲memory O(N), 所以假设in memory sort:
qsort - O(N * log N)
radix sort - O(N)
问题来了 请问可以作radix sort吗? 没学过这个sort 只是听说它可以O(N)
2. Use the extra computer as buffer, take the smallest number in each
computers
Construct a heap, each node has info: <data, source of data>
- O(N * logN) time, O(N) space
3. Take out root node from heap, peek the source, then get a new number
from that source computer - logN
Insert new number into heap - logN
Do it (N * N) / 2 -1 times, the root node's value is the answer
- O(N^2 * logN)
Total: O(N * logN) + O(N * logN) + O(N^2 * logN) = O(N^2 * logN)
不知到还有没有人有更棒的解法?? 好想知道啊~
第二道题:
How to fast check if a URL is visited by web crawler?
我看到的解法: hash table (有这麽简单吗@@)
直觉上来说好像不对劲
一个URL假设是20 char, 算20 bytes
假设Internet有5 billion pages -> 5 * 20 billion bytes = 100 billion bytes
= 100 GB
100GB(至少) hastable? 有没搞错?
我查了一下wikipedia 上面也是说Google有个URL server专门在作这个URL revisit
check
请问真的是用Hashing吗 还是Distributed Hashing??
--
※ 发信站: 批踢踢参(ptt3.cc)
◆ From: 141.158.245.93