php 信息采集程序代码

复制代码代码如下:

<?

　　//采集首页地址

　　$url="http://emotion.pclady.com.cn/skills/";

　　//获取页面代码

　　$rs=file_get_contents($url);

　　//设置匹配正则

　　//$fp=fopen("text.txt","a");

　　//$fw=fwrite($fp,$rs);

　　//fclose($fp);

/*<A

　　href="http://emotion.pclady.com.cn/skills/0903/376476.html"

target=_blank>留住你身边的好男人</A>*/

$preg='/<i\s+class=\"titles\"><a\s+href=\"[^>]+\">(.*)<\/a><\/i>/i';

　　//进行正则搜索

　　preg_match_all($preg,$rs,$title);

　　//计算标题数量

　　$count=count($title[0]);

echo $count." ";

　　//通过标题数量进行内容采集

for ($i=0;$i<$count;$i++){

　　//设置内容页地址

$pr='/<a\s+href=\"[^>]+\">/isU';

　　preg_match_all($pr,$title[0][$i],$jurl);

　　$substr=substr($jurl[0][0],9);

　　$curl=substr($substr,0,-18);

　　//获取内容页代码

　　$c=file_get_contents($curl);

　　//设置内容页匹配正则

$pc='/<a\s+href=\"[^>]+\">/i';

　　//进行正则匹配搜索

　　preg_match($pc,$c,$content);

　　//输出标题

echo $title[0][$i]." ";

echo $title[1][$i]." ";

　　$concount=count($content[0]);

echo $concount." ";

　　echo $content[0][0];

for ($j=0;$j<$concount;$j++){

　　}

　　?>

通过检测，$c已经是内容页的数据流了，可是$pc这个的正则表达式为什么只匹配<这个字符其他的都没有呢，是因为我上面用了subsrt（）函数吗？还是什么问题？麻烦各位大侠指点迷津啊？