最好的bt搜索引擎,求目前最全最好的bt资源搜索引擎

时间:2021-12-8 作者:admin ; 共有38 次浏览
本原创作文共计有15178个字 ,如需转载,请注明本文信息。
本原创作文发布具体时间:2021-12-08 16:15 星期三
本原创作文网址链接:http://www.28644.cn/7288.html
本原创作文关键词: 最好的bt搜索引擎,求目前最全最好的bt资源搜索引擎

玉芮静OG

基本什么都有 *kkeeu* 没有什么限制的,可以看看~

须发工人

1、DiggBT DiggBT,这是一个搜索BT资源很厉害的资源搜索网站,界面简洁,偶尔会出现广告,但是就资源搜索来说,它还是很强大的,可以搜索到的资源也非常多。 2、囧搜 囧搜,这是一个界面简单无广告的BT资源搜索网站,?

玉宇琼楼av

没用过诶,帮不到你了诶。

卷展祝水丹

强烈推荐用哇嘎最新版本的 不错 资源丰富。这个软件,你可以到华军软件园下载。你也可以用bt发布系统网站搜索也行。我知道的一个不错哦…

西红柿番茄炒

http://bt.5qzone.net 这个相当好基本什么都有 任何资源都有好多种zi 不信的话可以试试 相当不错。

晏兰梦02U

个人觉得最好的,最方便的搜索引擎应该是百度,百度是最好用的搜索引擎,非常非常的好用,方便快速准确,下面是有关手机的使用方法,注意事项送给你 手机使用知识 手机,在一天当中与我们形影不离,生活中处处都会有手机…

大宝贝116

是被夜里的风吹散的吧。只是,风不经意间就把一个七月吹远了。 槐荫铺就的小路,似乎很久没有去了,和七月一样被我忽略了。 一起忽略的,还有我的热爱。总归是为了一些无关紧要却又 牵住心绪的事情吧。当!

http://www.my-webworld.cn/ 上这个网站找看看»

yiqiandu

http://bt.5qzone.net/ 5Q BT( http://bt.5qzone.net ),是5Q网站中最为火爆的一个栏目,也许很多人对于BT还不怎么了解,BT全称BitTorrent下载,它是一个多点下载,源码公开的P2P软件, 使用非常方便,就像一。

anonymous

像这样的BT压缩文件的密码,大多都是需yao通过加入会员这样的方式获取,或者是花钱才可以获取密码,甚至有花钱加群了,但依旧得不到密码,这个时候才发现就是在骗钱,就是利用H诈骗,赚钱。

前端 Client asp.net mvc ef 6.0

后端 JAVA系统:

* j2ee核心组件:jsp、servlet、jdbc、ejb、jndi
* 数据通信:xml标记语言
* 前台页面展示:html、dhtml、vue、Ext开源框架
* 控制层:SPRING MVC 
* 业务逻辑层:spring核心
* 数据持久层: 、mybatis  HIBERNATE
* 中间件:ejb(2.0)
* 操作系统:WINDOWS 2008
* 数据库:db2、oracle
* 应用服务器:JBOSS AS 7 TOMCAT
* 开发工具:WebSphere Studio Application Developer(WSAD),eclipse
* 搜索 ES 6.22 + SOLR 双引擎

.NET系统:企业库5.0+EF6.0开发,结合WCF架构。

 原理如下:

后台采用:JAVA 语言写的爬虫系统。

超过八千万条数据HASH。

 

DHT网络爬虫的实现

DHT协议原理以及一些重点分析:

   要做DHT的爬虫,首先得透彻理解DHT,这样才能知道在什么地方究竟该应用什么算法去解决问题。关于DHT协议的细节以及重要的参考文章,请参考文末1

   DHT协议作为BT协议的一个辅助,是非常好玩的。它主要是为了在BT正式下载时得到种zi或者BT资源。传统的网络,需yao一台中央服务器存放种zi或者BT资源,不仅浪费服务器资源,还容易出现单点的各种问题,而DHT网络则是为了去中心化,也就是说任意时刻,这个网络总有节点是亮的,你可以去询问问这些亮的节点,从而将自己加入DHT网络。

   要实现DHT协议的网络爬虫,主要分3步,第一步是得到资源信息(infohash,160bit,20字节,可以编码为40字节的十六进制字符串),第二步是确认这些infohash是有效的,第三步是通过有效的infohash下载到BT的种zi文件,从而得到对这个资源的完整描述。

   其中第一步是其他节点用DHT协议中的get_peers方法向爬虫发送请求得到的,第二步是其他节点用DHT协议中的announce_peer向爬虫发送请求得到的,第三步可以有几种方式得到,比如可以去一些保存种zi的网站根据infohash直接下载到,或者通过announce_peer的节点来下载到,具体如何实现,可以取决于你自己的爬虫。

   DHT协议中的主要几个操作:

 

   主要负责通过UDP与外部节点交互,封装4种基本操作的请求以及相应。

   ping:检cha一个节点是否“存活”

   在一个爬虫里主要有两个地方用到ping,第一是初始路由表时,第二是验证节点是否存活时

   find_node:向一个节点发送查找节点的请求

   在一个爬虫中主要也是两个地方用到find_node,第一是初始路由表时,第二是验证桶是否存活时

   get_peers:向一个节点发送查找资源的请求

   在爬虫中有节点向自己请求时不仅像个正常节点一样做出回应,还需yao以此资源的info_hash为机会尽可能多的去认识更多的节点。如图,get_peers实际上最后一步是announce_peer,但是因为爬虫不能announce_peer,所以实际上get_peers退化成了find_node操作。

   announce_peer:向一个节点发送自己已经开始下载某个资源的通知

   爬虫中不能用announce_peer,因为这就相当于通报虚假资源,对方很容易从上下文中判断你是否通报了虚假资源从而把你禁掉

   DHT协议中有几个重点的需yao澄清的地方:

   1. node与infohash同样使用160bit的表示方式,160bit意味着整个节点空间有2^160 = 730750818665451459101842416358141509827966271488,是48位10进制,也就是说有百亿亿亿亿亿个节点空间,这么大的节点空间,是足够存放你的主机节点以及任意的资源信息的。

   2. 每个节点有张路由表。每张路由表由一堆K桶组成,所谓K桶,就是桶中最多只能放K个节点,默认是8个。而桶的保存则是类似一颗前缀树的方式。相当于一张8桶的路由表中最多有160-4个K桶。

   3. 根据DHT协议的规定,每个infohash都是有位置的,因此,两个infohash之间就有距离一说,而两个infohash的距离就可以用异或来表示,即infohash1 xor infohash2,也就是说,高位一样的话,他们的距离就近,反之则远,这样可以快速的计算两个节点的距离。计算这个距离有什么用呢,在DHT网络中,如果一个资源的infohash与一个节点的infohash越近则该节点越有可能拥有该资源的信息,为什么呢?可以想象,因为人人都用同样的距离算法去递归的询问离资源接近的节点,并且只要该节点做出了回应,那么就会得到一个announce信息,也就是说跟资源infohash接近的节点就有更大的概率拿到该资源的infohash

   4. 根据上述算法,DHT中的查询是跳跃式查询,可以迅速的跨越的的节点桶而接近目标节点桶。之所以在远处能够大幅度跳跃,而在近处只能小幅度跳跃,原因是每个节点的路由表中离自身越接近的节点保存得越多,如下图

   5. 在一个DHT网络中当爬虫并不容易,不像普通爬虫一样,看到资源就可以主动爬下来,相反,因为得到资源的方式(get_peers, announce_peer)都是被动的,所以爬虫的方式就有些变化了,爬虫所要做的事就是像个正常节点一样去响应其他节点的查询,并且得到其他节点的回应,把其中的数据收集下来就算是完成工作了。而爬虫唯一能做的,是尽可能的去多认识其他节点,这样,才能有更多其他节点来向你询问。

   6. 有人说,那么我把DHT爬虫的K桶中的容量K增大是不是就能增加得到资源的机会,其实不然,之前也分析过了,DHT爬虫最重要的信息来源全是被动的,因为你不能增大别人的K,所以距离远的节点保存你自身的概率就越小,当然距离远的节点去请求你的概率相对也比较小。

   一些主要的组件(实际实现更加复杂一些,有其他的模块,这里仅列举主要几个):

   DHT crawler

   这个就是DHT爬虫的主逻辑,为了简化多线程问题,跟server用了生产者消费者模型,负责消费,并且复用server的端口。

   主要任务就是负责初始化,包括路由表的初始化,以及初始的请求。另外负责处理所有进来的消息事件,由于生产者消费者模型的使用,里面的操作都基本上是单线程的,简化了不少问题,而且相信也比上锁要提升速度(当然了,加锁这步按理是放到了queue这里了,不过对于这种生产者源源不断生产的类型,可以用ring-buffer大幅提升性能)。

   DHT server

   这里是DHT爬虫的服务器端,DHT网络中的节点不单是client,也是server,所以要有server担当生产者的角色,最初也是每个消费者对应一个生产者,但实际上发现可以利用IO多路复用来达到消息事件的目的,这样一来大大简化了系统中线程的数量,如果client可以的话,也应该用同样的方式来组织,这样系统的速度应该会快很多。(尚未验证)

   DHT route table

   主要负责路由表的操作。

   路由表有如下操作:

   init:刚创建路由表时的操作。分两种情况:

   1. 如果之前已经初始化过,并且将上次路由表的数据保存下来,则只需yao读入保存数据。

   2. 如果之前没有初始化过,则首先应当初始化。

   首先,应当有一个接入点,也就是说,你要想加进这个网络,必须认识这个网络中某个节点i并将i加入路由表,接下来对i用find_node询问自己的hash_info,这里巧妙的地方就在于,理论上通过一定数量的询问就会找到离自己距离很近的节点(也就是经过一定步骤就会收敛)。find_node目的在于尽可能早的让自己有数据,并且让网络上别的节点知道自己,如果别人不认识你,就不会发送消息过来,意味着你也不能获取到想要的信息。

   search:比较重要的方法,主要使用它来定位当前infohash所在的桶的位置。会被其他各种代理方法调用到。

   findNodes:找到路由表中与传入的infohash最近的k个节点

   getPeer:找到待查资源是否有peer(即是否有人在下载,也就是是否有人announce过)

   announcePeer:通知该资源正在被下载

   DHT bucket:

   acitiveNode:逻辑比较多,分如下几点。

 

        1. 查找所要添加的节点对应路由表的桶是否已经满,如果未满,添加节点

        2. 如果已经满,检cha该桶中是否包含爬虫节点自己,如果不包含,抛弃待添加节点

        3. 如果该桶中包含本节点,则平均分裂该桶

   其他的诸如locateNode,
replaceNode, updateNode,
removeNode,就不一一说明了

   DHT torrent parser  

   主要从bt种zi文件中解析出以下几个重要的信息:name,size,file list(sub file name, sub file size),比较简单,用bencode方向解码就行了

   Utils

   distance:计算两个资源之间的距离。在kad中用a xor b表示

 

   为了增加难度,选用了不太熟悉的语言python,结果步步为营,但是也感慨python的简洁强大。在实现中,也碰到很多有意思的问题。比如如何保存一张路由表中的所有桶,之前想出来几个办法,甚至为了节省资源,打算用bit数组+dict直接保存,但是因为估计最终的几个操作不是很方便直观容易出错而放弃,选用的结构就是前缀树,操作起来果然是没有障碍;

   在超时问题上,比如桶超时和节点超时,一直在思考一个高效但是比较优雅的做法,可以用一个同步调用然后等待它的超时,但是显然很低效,尤其我没有用更多线程的情况,一旦阻塞了就等于该端口所有事件都被阻塞了。所以必须用异步操作,但是异步操作很难去控制它的精确事件,当然,我可以在每个事件来的时候检cha一遍是否超时,但是显然也是浪费和低效。那么,剩下的只有采用跟tomcat类似的方式了,增加一个线程来监控,当然,这个监控线程最好是全局的,能监控所有crawler中所有事务的超时。另外,超时如果控制不当,容易导致内存没有回收以至于内存泄露,也值得注意。超时线程是否会与其他线程互相影响也应当仔细检cha。

   最初超时的控制没处理好,出现了ping storm,运行一定时间后大多数桶已经满了,如果按照协议中的方式去跑的话会发现大量的事件都是在ping以确认这个节点是否ok以至于大量的cpu用于处理ping和ping响应。深入理解后发现,检cha节点状态是不需yao的,因为节点状态只是为了提供给询问的人一些好的节点,既然如此,可以将每次过来的节点替换当前桶中最老的节点,如此一来,我们将总是保存着最新的节点。

   搜索算法也是比较让我困惑的地方,简而言之,搜索的目的并不是真正去找资源,而是去认识那些能够保存你的节点。为什么说是能够保存你,因为离你越远,桶的数量越少,这样一来,要想进他们的桶中去相对来说就比较困难,所以搜索的目标按理应该是附近的节点最好,但是不能排除远方节点也可能保存你的情况,这种情况会发生在远方节点初始化时或者远方节点的桶中节点超时的时候,但总而言之,概率要小些。所以搜索算法也不应该不做判断就胡乱搜索,但是也不应该将搜索的距离严格限制在附近,所以这是一个权衡问题,暂时没有想到好的方式,觉得暂时让距离远的以一定概率发生,而距离近的必然发生

   还有一点,就是搜索速度问题,因为DHT网络的这种结构,决定了一个节点所认识的其他节点必然是有限的附近节点,于是每个节点在一定时间段内能拿到的资源数必然是有限的,所以应当分配多个节点同时去抓取,而抓取资源的数量很大程度上就跟分配节点的多少有关了。

   最后一个值得优化的地方是findnodes方法,之前的方式是把一个桶中所有数据拿出来排序,然后取其中前K个返回回去,但是实际上我们做了很多额外的工作,这是经典的topN问题,使用排序明显是浪费时间的,因为这个操作非常频繁,所以即便所有保存的节点加起来很少((160 – 4) * 8),也会一定程度上增加时间。而采用的算法是在一篇论文《可扩展的DHT网络爬虫设计和优化》中找到的,基本公式是IDi = IDj xor 2 ^(160 – i),这样,已知IDi和i就能知道IDj,若已知IDi和IDj就能知道i,通过这种方式,可以快速的查找该桶A附近的其他桶(显然是离桶A层次最近的桶中的节点距离A次近),比起全部遍历再查找效率要高不少。

  

    dht协议http://www.bittorrent.org/beps/bep_0005.html 及其翻译http://gobismoon.blog.163.com/blog/static/5244280220100893055533/

    基于dht协议的网络爬虫http://codemacro.com/2013/05/19/crawl-dht/

    dht协议的原理分析,非常不错,建议一看http://blog.sina.com.cn/s/blog_5384aaf00100a88k.html

 

BitTorrent uses a “distributed sloppy hash table” (DHT) for storing peer contact information for “trackerless” torrents. In effect, each peer becomes a tracker. The protocol is based on Kademila [1] and is implemented over UDP.

Please note the terminology used in this document to avoid confusion. A “peer” is a client/server listening on a TCP port that implements the BitTorrent protocol. A “node” is a client/server listening on a UDP port implementing the distributed hash table protocol. The DHT is composed of nodes and stores the location of peers. BitTorrent clients include a DHT node, which is used to contact other nodes in the DHT to get the location of peers to download from using the BitTorrent protocol.

Overview

Each node has a globally unique identifier known as the “node ID.” Node IDs are chosen at random from the same 160-bit space as BitTorrent infohashes[2]. A “distance metric” is used to compare two node IDs or a node ID and an infohash for “closeness.” Nodes must maintain a routing table containing the contact information for a small number of other nodes. The routing table becomes more detailed as IDs get closer to the node’s own ID. Nodes know a_b_out many other nodes in the DHT that have IDs that are “close” to their own but have only a handful of contacts with IDs that are very far away from their own.

In Kademlia, the distance metric is XOR and the result is interpreted as an unsigned integer.distance(A,B) = |A xor B| Smaller values are closer.

When a node wants to find peers for a torrent, it uses the distance metric to compare the infohash of the torrent with the IDs of the nodes in its own routing table. It then contacts the nodes it knows a_b_out with IDs closest to the infohash and asks them for the contact information of peers currently downloading the torrent. If a contacted node knows a_b_out peers for the torrent, the peer contact information is returned with the response. Otherwise, the contacted node must respond with the contact information of the nodes in its routing table that are closest to the infohash of the torrent. The original node iteratively queries nodes that are closer to the target infohash until it cannot find any closer nodes. After the search is exhausted, the client then inserts the peer contact information for itself onto the responding nodes with IDs closest to the infohash of the torrent.

The return value for a query for peers includes an opaque value known as the “token.” For a node to announce that its controlling peer is downloading a torrent, it must present the token received from the same queried node in a recent query for peers. When a node attempts to “announce” a torrent, the queried node checks the token against the querying node’s IP address. This is to prevent malicious hosts from signing up other hosts for torrents. Since the token is merely returned by the querying node to the same node it received the token from, the implementation is not defined. Tokens must be accepted for a reasonable amount of time after they have been distributed. The BitTorrent implementation uses the SHA1 hash of the IP address concatenated onto a secret that changes every five minutes and tokens up to ten minutes old are accepted.

Routing Table

Every node maintains a routing table of known good nodes. The nodes in the routing table are used as starting points for queries in the DHT. Nodes from the routing table are returned in response to queries from other nodes.

Not all nodes that we learn a_b_out are equal. Some are “good” and some are not. Many nodes using the DHT are able to send queries and receive responses, but are not able to respond to queries from other nodes. It is important that each node’s routing table must contain only known good nodes. A good node is a node has responded to one of our queries within the last 15 minutes. A node is also good if it has ever responded to one of our queries and has sent us a query within the last 15 minutes. After 15 minutes of inactivity, a node becomes questionable. Nodes become bad when they fail to respond to multiple queries in a row. Nodes that we know are good are given priority over nodes with unknown status.

The routing table covers the entire node ID space from 0 to 2160. The routing table is subdivided into “buckets” that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2160. When a node with ID “N” is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming “full.” When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2159 and 2159..2160.

When the bucket is full of good nodes, the new node is simply discarded. If any nodes in the bucket are known to have become bad, then one is replaced by the new node. If there are any questionable nodes in the bucket have not been seen in the last 15 minutes, the least recently seen node is pinged. If the pinged node responds then the next least recently seen questionable node is pinged until one fails to respond or all of the nodes in the bucket are known to be good. If a node in the bucket fails to respond to a ping, it is suggested to try once more before discarding the node and replacing it with a new good node. In this way, the table fills with stable long running nodes.

Each bucket should maintain a “last changed” property to indicate how “fresh” the contents are. When a node in a bucket is pinged and it responds, or a node is added to a bucket, or a node in a bucket is replaced with another node, the bucket’s last changed property should be updated. Buckets that have not been changed in 15 minutes should be “refreshed.” This is done by picking a random ID in the range of the bucket and performing a find_nodes search on it. Nodes that are able to receive queries from other nodes usually do not need to refresh buckets often. Nodes that are not able to receive queries from other nodes usually will need to refresh all buckets periodically to ensure there are good nodes in their table when the DHT is needed.

Upon inserting the first node into its routing table and when starting up thereafter, the node should attempt to find the closest nodes in the DHT to itself. It does this by issuing find_node messages to closer and closer nodes until it cannot find any closer. The routing table should be saved between invocations of the client software.

BitTorrent Protocol Extension

The BitTorrent protocol has been extended to exchange node UDP port numbers between peers that are introduced by a tracker. In this way, clients can get their routing tables seeded automatically through the download of regular torrents. Newly installed clients who attempt to download a trackerless torrent on the first try will not have any nodes in their routing table and will need the contacts included in the torrent file.

Peers supporting the DHT set the last bit of the 8-byte reserved flags exchanged in the BitTorrent protocol handshake. Peer receiving a handshake indicating the remote peer supports the DHT should send a PORT message. It begins with byte 0x09 and has a two byte payload containing the UDP port of the DHT node in network byte order. Peers that receive this message should attempt to ping the node on the received port and IP address of the remote peer. If a response to the ping is recieved, the node should attempt to insert the new contact information into their routing table according to the usual rules.

Torrent File Extensions

A trackerless torrent dictionary does not have an “announce” key. Instead, a trackerless torrent has a “nodes” key. This key should be set to the K closest nodes in the torrent generating client’s routing table. Alternatively, the key could be set to a known good node such as one operated by the person generating the torrent. Please do not automatically add “router.bittorrent.com” to torrent files or automatically add this node to clients routing tables.

nodes = [["<host>", <port>], ["<host>", <port>], ...]nodes = [["127.0.0.1", 6881], ["your.router.node", 4804]]

KRPC Protocol

The KRPC protocol is a simple RPC mechanism consisting of bencoded dictionaries sent over UDP. A single query packet is sent out and a single packet is sent in response. There is no retry. There are three message types: query, response, and error. For the DHT protocol, there are four queries: ping, find_node, get_peers, and announce_peer.

A KRPC message is a single dictionary with two keys common to every message and additional keys depending on the type of message. Every message has a key “t” with a string value representing a transaction ID. This transaction ID is generated by the querying node and is echoed in the response, so responses may be correlated with multiple queries to the same node. The transaction ID should be encoded as a short string of binary numbers, typically 2 characters are enough as they cover 2^16 outstanding queries. The other key contained in every KRPC message is “y” with a single character value describing the type of message. The value of the “y” key is one of “q” for query, “r” for response, or “e” for error.

Contact Encoding

Contact information for peers is encoded as a 6-byte string. Also known as “Compact IP-address/port info” the 4-byte IP address is in network byte order with the 2 byte port in network byte order concatenated onto the end.

Contact information for nodes is encoded as a 26-byte string. Also known as “Compact node info” the 20-byte Node ID in network byte order has the compact IP-address/port info concatenated to the end.

Queries

Queries, or KRPC message dictionaries with a “y” value of “q”, contain two additional keys; “q” and “a”. Key “q” has a string value containing the method name of the query. Key “a” has a dictionary value containing named arguments to the query.

Responses

Responses, or KRPC message dictionaries with a “y” value of “r”, contain one additional key “r”. The value of “r” is a dictionary containing named return values. Response messages are sent upon successful completion of a query.

Errors

Errors, or KRPC message dictionaries with a “y” value of “e”, contain one additional key “e”. The value of “e” is a list. The first element is an integer representing the error code. The second element is a string containing the error message. Errors are sent when a query cannot be fulfilled. The following table describes the possible error codes:

Code

www.28644.cn声明:本文由网友自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件进行举报,并提供相关证据,工作人员会在10天内联系你,一经查实,本站将立刻删除涉嫌侵权内容。

今日推荐

狗狗明白了

一个毛毛虫爬进书房。不久,漂亮的蝴蝶从窗口飞出去。 “哦?蝴蝶是书的宝宝!”小狗惊喜, […]