管理资源吧首页>>>教程>>>编程>>>PHP教程>>>

php截取字符串之截取utf8或gbk编码的中英文字符串示例

　　微博的发言有字数限制，其计数方式是，中文算2个，英文算1个，全角字符算2个，半角字符算1个。

　　php中自带strlen是返回的字节数，对于utf8编码的中文返回时3个，不满足需求。

　　mb_strlen 可以根据字符集计算长度，比如utf8的中文计数为1，但这不符合微博字数限制需求，中文必须计算为2才可以。

　　google了下，找到一个discuz中截取各种编码字符的类，改造了下，已经测试通过.其中参数$charset 只支持gbk与utf-8。

复制代码代码如下:

　　$a = "s＠@你好";

　　var_dump(strlen_weibo($a,'utf-8'));

　　结果输出为8，其中字母s计数为1，全角＠计数为2，半角@计数为1，两个中文计数为4。源码如下：

复制代码代码如下:

　　function strlen_weibo($string, $charset='utf-8')

　　{

　　$n = $count = 0;

　　$length = strlen($string);

　　if (strtolower($charset) == 'utf-8')

　　{

　　while ($n < $length)

　　{

　　$currentByte = ord($string[$n]);

　　if ($currentByte == 9 ||

　　$currentByte == 10 ||

　　(32 <= $currentByte && $currentByte <= 126))

　　{

　　$n++;

　　$count++;

　　} elseif (194 <= $currentByte && $currentByte <= 223)

　　{

　　$n += 2;

　　$count += 2;

　　} elseif (224 <= $currentByte && $currentByte <= 239)

　　{

　　$n += 3;

　　$count += 2;

　　} elseif (240 <= $currentByte && $currentByte <= 247)

　　{

　　$n += 4;

　　$count += 2;

　　} elseif (248 <= $currentByte && $currentByte <= 251)

　　{

　　$n += 5;

　　$count += 2;

　　} elseif ($currentByte == 252 || $currentByte == 253)

　　{

　　$n += 6;

　　$count += 2;

　　} else

　　{

　　$n++;

　　$count++;

　　}

　　if ($count >= $length)

　　{

　　break;

　　}

　　return $count;

　　} else

　　{

　　for ($i = 0; $i < $length; $i++)

　　{

　　if (ord($string[$i]) > 127)

　　{

　　$i++;

　　$count++;

　　}

　　$count++;

　　}

　　return $count;

　　}

教程首页更多教程