CSSE 板


LINE

Genome Model Applied to Software By Danny O'Brien 02:00 AM Oct. 04, 2004 PT What does uncovering the secret language of DNA have in common with reverse-engineering Microsoft software? Quite a lot, according to Marshall Beddoe, a security analyst who is turning to algorithms used in bioinformatics research to understand the arcane mysteries of closed, proprietary software. Beddoe explained his methodology in a presentation at ToorCon, last week's San Diego security and hacker conference, showing how biologists' work can make the often tedious task of reverse-engineering network software a little simpler. In his presentation, Beddoe noted that over the last 30 years, biologists have developed an impressive battery of algorithms to spot commonalities between samples of DNA. It suddenly became very obvious to me I could use the algorithms used in genomics in protocol analysis," said Beddoe. "So obvious, that I've been having real problems explaining to other engineers how it works." Beddoe calls his approach "Protocol Informatics." For years, reverse engineers have struggled to understand the arcane mysteries of proprietary software from the only clues they had: the way the code communicates over open networks. Groups like the open-source Samba project, whose code lets Unix systems interoperate with Windows file servers, have spent hundreds of volunteer hours laboriously probing and analyzing data packets emitted by Microsoft's software. Reverse-engineering these protocols generally consists of looking through dozens of separate computer conversations, scouring patterns that repeat themselves and working out what those sequences might mean. The trick, according to Beddoe, is to make the right connections between the terminology of genetics and protocol analysis. Much of bioinformatics is devoted to finding DNA sequences separated by long gaps of unknown data, then a continuation of a known sequence. Since much of DNA is filled with repeating, seemingly irrelevant noise, eliminating these gaps is a common problem in genomics. The same is true in protocol reverse-engineering. To researchers like Beddoe, network conversations are full of "junk" -- usually the actual data being sent -- which interferes with the analysis of the occasional command sequence that controls what to do with that junk. Beddoe dug up some of the oldest algorithms in the bioinformatics' armory, and used them to eliminate junk data among patterns of repeated commands. Geneticists have also spent many years analyzing the rate of mutation between different DNA samples. Given two pieces of DNA, biologists have devised complex algorithms to discover whether they're descended from the same ancestors. The method works by comparing the genetic differences with the known mutation rates of certain DNA components. Beddoe applied the same principles to his mutating network conversations. He notes, for example, that ASCII text is much more likely to "mutate" into other text than it is to mutate into something else. By feeding in probabilities about text instead of DNA nucleotides, Beddoe discovered that he could more easily spot related fields in network exchanges. The genetics algorithms told him that some chunks of data were close relations; in fact, they were bits of the network protocol that were performing similar actions. Geneticists have also had a head start in visualizing unimaginable heaps of data. Beddoe took the equations used by geneticists to display a species' family tree and created a family tree of his analyzed protocols. The result: a phylogenetic tree of Microsoft's SMB protocol, clumping interesting fragments together for further investigation. Beddoe isn't the only one in the computer security world casting an envious eye over the bioinformatics sector's research. Dan Kaminsky, senior security consultant for Avaya, said he has been investigating using genomic pattern analysis for identifying and clustering "mutant" machines on a corporate network: PCs whose variation from the company's standard installation might make them vulnerable to compromise. Kaminsky thinks this is only the beginning for the spread of bioinformatics ideas into other fields. "Generating an ordered, hierarchal breakdown of interrelationships from huge piles of information is a problem that crops up everywhere. I'm not surprised to see bioinformatics solutions finally being applied to the rest of our poorly understood, oversized networks." On the biology end, Terry Gaasterland, associate professor of computational genomics at Rockefeller University, agrees that there's a wide field of uses for the algorithms her discipline has developed -- and tricks to be learned by biologists from other fields, too. "The problem of decoding the language of networks and the problem of finding signals in DNA are really two related instances of machine learning problems. We're almost bound to discover universal principles of information communication by investigating both," she said. For the time being, though, Beddoe and others have one more decoding problem to battle: understanding the jargon of another field's documentation. Justin Mason, the creator of SpamAssassin, is investigating bioinformatics approaches to spam identification. He said that to outsiders, the genomics world can seem more closed than the world of network engineers. "A lot of the interesting research takes place in expensive journals and seminars that we can't really get hold of. It's a bit of a difference from the free exchange you get between coders online," he said. Beddoe himself deduced much of the algorithms he used from downloading PowerPoint slides from biologists' websites. Gaasterland disagreed with Mason's assessment and said many bioinformatic papers become freely available six months after publication. She added that the publication of Beddoe's work might provide him with more assistance from the bioinformatics community. That'll come as a relief to Beddoe, who until now assumed that biologists wouldn't pay much heed to his project. <Wired News> http://www.wired.com/news/infostructure/0,1377,65191,00.html --



※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 61.222.173.26







like.gif 您可能會有興趣的文章
icon.png[問題/行為] 貓晚上進房間會不會有憋尿問題
icon.pngRe: [閒聊] 選了錯誤的女孩成為魔法少女 XDDDDDDDDDD
icon.png[正妹] 瑞典 一張
icon.png[心得] EMS高領長版毛衣.墨小樓MC1002
icon.png[分享] 丹龍隔熱紙GE55+33+22
icon.png[問題] 清洗洗衣機
icon.png[尋物] 窗台下的空間
icon.png[閒聊] 双極の女神1 木魔爵
icon.png[售車] 新竹 1997 march 1297cc 白色 四門
icon.png[討論] 能從照片感受到攝影者心情嗎
icon.png[狂賀] 賀賀賀賀 賀!島村卯月!總選舉NO.1
icon.png[難過] 羨慕白皮膚的女生
icon.png閱讀文章
icon.png[黑特]
icon.png[問題] SBK S1安裝於安全帽位置
icon.png[分享] 舊woo100絕版開箱!!
icon.pngRe: [無言] 關於小包衛生紙
icon.png[開箱] E5-2683V3 RX480Strix 快睿C1 簡單測試
icon.png[心得] 蒼の海賊龍 地獄 執行者16PT
icon.png[售車] 1999年Virage iO 1.8EXi
icon.png[心得] 挑戰33 LV10 獅子座pt solo
icon.png[閒聊] 手把手教你不被桶之新手主購教學
icon.png[分享] Civic Type R 量產版官方照無預警流出
icon.png[售車] Golf 4 2.0 銀色 自排
icon.png[出售] Graco提籃汽座(有底座)2000元誠可議
icon.png[問題] 請問補牙材質掉了還能再補嗎?(台中半年內
icon.png[問題] 44th 單曲 生寫竟然都給重複的啊啊!
icon.png[心得] 華南紅卡/icash 核卡
icon.png[問題] 拔牙矯正這樣正常嗎
icon.png[贈送] 老莫高業 初業 102年版
icon.png[情報] 三大行動支付 本季掀戰火
icon.png[寶寶] 博客來Amos水蠟筆5/1特價五折
icon.pngRe: [心得] 新鮮人一些面試分享
icon.png[心得] 蒼の海賊龍 地獄 麒麟25PT
icon.pngRe: [閒聊] (君の名は。雷慎入) 君名二創漫畫翻譯
icon.pngRe: [閒聊] OGN中場影片:失蹤人口局 (英文字幕)
icon.png[問題] 台灣大哥大4G訊號差
icon.png[出售] [全國]全新千尋侘草LED燈, 水草

請輸入看板名稱,例如:e-shopping站內搜尋

TOP