首页 > 资讯 > 后端开发 > Python >实用python+phpcmsv9

592

分享到

实用python+phpcmsv9

python 2023-01-31 01:01:06 592人浏览安东尼

Python 官方文档：入门教程 => 点击学习

摘要

转自臭脚丫丫博客，觉得有用，博主貌似关博客了，可惜。最近写了个根据结巴分词后的结果，两两对比分词后的集合，根据雅比克距离 = 集合的交集 / 集合的并集，计算出距离最大的前10个结果，并考虑到同义词。最后每日计算，存入数据库，用程序调用。#

转自臭脚丫丫博客，觉得有用，博主貌似关博客了，可惜。

最近写了个根据结巴分词后的结果，两两对比分词后的集合，根据雅比克距离 = 集合的交集 / 集合的并集，计算出距离最大的前10个结果，并考虑到同义词。最后每日计算，存入数据库，用程序调用。

#最近写了个根据结巴分词后的结果，两两对比分词后的集合，根据雅比克距离 = 集合的交集 / 集合的并集，计算出距离最大的前10个结果，并考虑到同义词。最后每日计算，存入数据库，用程序调用。
# coding=utf-8
import sys
reload(sys)
import Mysqldb
sys.setdefaultencoding('utf-8')
sys.path.append('../')
import jieba
jieba.initialize()
jieba.load_userdict("userdict.txt")
import jieba.analyse
import re
import operator
import string
f_ex = open('Words.txt','rb')  #排除词
f_out = open('output.txt','wb')
words = [line.strip() for line in f_ex.readlines()]  #python独特的列表解析
conn = mysqldb.connect(host="localhost",user="root",passwd="",db="root",charset="utf8")
cursor = conn.cursor()
cursor.execute("select a.id,a.title,a.description from v9_ask_a a limit 10")
data = cursor.fetchall()   #取所有结果
for x in data:
    f_out.write(str(x[0]).decode("utf-8")+'\t'+str(x[1]).decode("utf-8")+'\t')
    title = str(x[1]).decode("utf-8")
    tags_title = jieba.analyse.extract_tags(title, topK=10)
    for title_ex in tags_title:
        for title_word in words:
            if title_word.decode('gbk') == title_ex:
                title_ex = ''
        if title_ex != '':
            f_out.write(title_ex+'|')
    f_out.write('\r\n')
f_ex.close()
f_out.close()
cursor.execute("select * from v9_similar")  #从数据库里取相似词集合
similar = cursor.fetchall()
newdata = {}
for similardata in similar:
    wordslist = similardata[1].decode("utf-8").split(',')
    for wordx in wordslist:
        newdata.setdefault(similardata[0],[]).append(wordx)  #字典的一键多值，用列表作为字典的值
f_out = file('output.txt','rb')
result = f_out.readlines()
list1 = []
list2 = []
for x in result:
    relate_table = x.split('\t')
    list1.append(relate_table[0])
    list2.append(relate_table[2])
list3 = dict(zip(list1,list2))    #组合2个列表为字典
dic = {}
for x in list3:
    similarnum1 = []
    list3x = list3[x].decode("utf-8")
    new1 = list3x.split('|')
    for new1x in new1:
        for b in newdata:
            for c in newdata[b]:
                if c.decode("utf-8") == new1x:
                    similarnum1.append(b)  #将相似词的id写入到列表1
    for y in list3:
        similarnum2 = []
        if list3[x] != list3[y]:
            list3y = list3[y].decode("utf-8")
            new2 = list3y.split('|')
            for new2x in new2:
                for e in newdata:
                    for f in newdata[e]:
                        if f.decode("utf-8") == new2x:
                            similarnum2.append(e)  #将相似词的id写入到列表2
            jiao = len(list(set(new1)&set(new2)))   #取2个集合的交集
            bing = len(list(set(new1)|set(new2)))   #取2个集合的并集
            simlen = len(list(set(similarnum1)&set(similarnum2)))  #取对比两个集合在相似词集合的交集
            jiao+=simlen
            bing-=simlen
            result = float(jiao)/float(bing)   #计算结果要先进行转换
            dic.setdefault(x, { })[y] = result  #字典的一键多值，用子字典作为字典的值
dic2 = {}
for x in dic:
    dic1 = dic[x]
    sorted_x = sorted(dic1.iteritems(), key=lambda dic1 : dic1[1], reverse=True)    #将字典的键值进行降序排列
    sorted_x = sorted_x[0:10]   #取最大的10个键值
    for y in sorted_x:
        y = str(y)
        z = y.split("'")[1]
        dic2.setdefault(x, []).append(z)
for x in dic2:
    y = ",".join(dic2[x])   #将子字典转换成字符串插入数据库
    x = string.atoi(x)  #将字符串转换成整数
    value = [y,x]
    cursor.execute('update v9_ask_a set relate_id=%s where id=%s',value)
    conn.commit()
cursor.close()
conn.close()

觉得很实用，先留下来。。

1、PHPcms设置
先在扩展里面添加一下工作流，然后设置栏目发布需要审核就行

2、Python计算相关性
计算相关性的方法可以用tf-idf与余弦相似性：Http://www.ruanyifeng.com/blog/2013/03/cosine_similarity.html
我暂时用的是雅比克距离 = 集合的交集 / 集合的并集（英文名叫啥不清楚了），并且考虑到同义词。
先将文章进行分词，用结巴分词自带的tf-idf提取最重要的10个分词，然后再一一对比进行计算。
雅比克距离计算没啥好说的，我来说下计算的时候怎么考虑到同义词。

（1）在数据库里面新建表v9_similar，然后有2个字段：id，words，words存储的是同义词，用逗号隔开。

（2）将id和words存在字典里面，一键多值

cursor.execute("select * from v9_similar")
similar = cursor.fetchall()
newdata = {}     //newdata字典用于存储近义词
for similardata in similar:
    wordslist =similardata[1].decode("utf-8").split(',')
    for wordx in wordslist:
        newdata.setdefault(similardata[0],[]).append(wordx)

（3）将文章的前10个分词和同义词进行对比，如果有在同义词里面出现，那么记录下改同义词的id

for x in list3:  //将要计算的所有文章分词后组成列表，用"|"分割
    similarnum1 = []
    list3x = list3[x].decode("utf-8")
    new1 = list3x.split('|')
    for new1x in new1:
        for b in newdata:
            for c in newdata:
                if c.decode("utf-8") == new1x:
                    similarnum1.append(b) //如果有同义词，那么将id存储到列表里

这样，每篇文章与同义词进行对比的时候，就可以形成每篇文章的相似id列表。然后再将每篇文章的相似词列表进行取交集，就可以计算出2篇文章有多少分词是相似的。

jiao = len(list(set(new1)&set(new2)))
bing = len(list(set(new1)|set(new2)))
simlen = len(list(set(similarnum1)&set(similarnum2)))
//simlen就是将两篇文章进行相似ID列表取交集
jiao+=simlen      //取交集的时候加上simlen
bing-=simlen      //取并集的时候减去simlen
result = float(jiao)/float(bing)

（4）计算完结果后再将相似文章的id存入数据库表，用逗号隔开

    value = [y,x]
    cursor.execute('update v9_news set relate=%s where id=%s',value)

3、phpcms模板调用计算好的相似文章id

{pc:get sql="select * from v9_news where (id=$id and status=99)"}
  {loop $data $uu}
    {if $uu[relate] != ''} //判断是否为空，如果是相关性没有计算好，导致这个字段为空值，就会出错
      {php $URLid=explode(',',$uu[relate]);}     //将相似文章id用逗号分隔，取ID
         {loop $URLid $ab}
            {pc:get sql="select * from v9_news where id=$ab"}
               {loop $data $bc}
                  <a href="{$bc[url]}" target="_blank">{$bc[title]}</a>
               {/loop}
           {/pc}
       {/loop}
    {/if}
  {/loop}
{/pc}

4、python随机取一个审核状态的文章，将status字段设置为99（审核状态为1），就可以发布了。

如果是生成的静态html文章，那么还需要要再一步，那就是批量生成已经审核，但是没有生成静态页面的文章。

需要修改phpcms/modules/content/目录下的create_html.php

和phpcms/modules/content/class/下的url.class.php,html.class.php

然后再根据mvc框架，传参就行，类似：

http://www.baidu.com/index.php?m=content&c=create_html&a=showzd&dosubmit=1&s=1&modelid=27&siteid=10&str=1093

5、最后一步就是将python定时更新脚本挂在linux服务器上了，需要生成静态页面定时访问那个url就行了。

您可能感兴趣的文档:

--结束END--

本文标题: 实用python+phpcmsv9

本文链接: https://lsjlt.com/news/183578.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

回答

如何调试操作系统的错误？
操作系统

2023-11-15发布

回答

操作系统中的I/O系统是如何实现的？
操作系统

2023-11-15发布

回答

如何实现操作系统的内存管理？
操作系统

2023-11-15发布

回答

什么是虚拟内存，它对操作系统有什么影响？
操作系统

2023-11-15发布

回答

ASP中的MVC架构和WebForms架构有什么区别和使用场景？
ASP.NET

2023-11-15发布

回答

ASP中的数据验证和数据校验有什么不同？
ASP.NET

2023-11-15发布

回答

ASP中的ADO对象和DAO对象有什么区别和使用方法？
ASP.NET

2023-11-15发布

回答

Node.js中的包管理器NPM是什么？如何使用它进行依赖管理？
node.js

2023-11-15发布

回答

Vue.js中的动态组件是什么？如何使用它来动态渲染组件？
VUE

2023-11-15发布

回答

如何使用Vue.js实现懒加载和预加载？
VUE

2023-11-15发布

实用python+phpcmsv9

实用python+phpcmsv9

PHPCMSV9父栏目调用子栏目的方法

[转]python---用Python实

[Python]用python实现批量/

用python实现Minecraft

用 Python 实现 OPCUA

用python实现ping

python 实用脚本

用 Python 实现 LDA

Python-master，实用Python脚本合集！

Python基础用Python实现时钟

用Python实现Zabbix-API

利用virtualenv实现Python

使用 python 实现 Voronoi

python groupby用法实战

python 使用ldap实例

用Python实现modbus slav

用python实现调用jar包

用python实现五子棋实例

python分数实例用法

python分析数据的方法是什么

如何使用Python实现抽奖小程序

python copy函数的作用是什么

python ffmpeg模块怎么安装和使用

python进程池创建队列的方法是什么

python无法运行文件的原因有哪些

python can't open file报错怎么解决

python keyerror错误怎么解决

python字符串处理与应用的方法有哪些

python全局变量如何定义