python给网站加内链 python 分词批量生成关键词和对应文章链接
python给网站加内链
实例说明:
小黄鸡经验网,可以说完全是佛系SEO,很多SEO该做的事我都没有去做。甚至于首页3大标签很长一段时间里都只写了title。其实想法很多,但一直忙于看小说刷电影(懒)。今天给小黄鸡文章加个内链。对了小黄鸡到今天文章已经发了3000篇了。要一篇一篇加内链。显然是不可能做到的。所以还是让python处理吧
做法分析:
1、爬取小黄鸡网站地图
2、分析提取文章title和对应链接
3.根据title分词,生成关键词和对应链接写入txt文件
实例代码:
不多说,我看自己的代码都想吐。。。流下没有技术的眼泪。。。
import jieba import requests from bs4 import BeautifulSoup import string url = "https://www.shenhuangji.com/sitemaps.html" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0' } # 第一步:抓取提取文章title和href res = requests.get(url=url, headers=headers) if res.status_code == 200: # print(res.content.decode()) res.encoding = "utf-8" soup = BeautifulSoup(res.text, 'html.parser') result = [] for item in soup.find('div', class_='articles').find_all('li'): title = item.find('a').text url = item.find('a')['href'] tu = title + "," + url result.append(tu) # print(result) # 第二步:将title和href数据写入txt文件 with open('./title_href.txt', 'w', encoding='utf-8') as fp: for r in result: fp.write(r + '\n') # 第三步:读取第二步生成的txt文件,进行title的分词 with open('./title_href.txt', 'r', encoding='utf-8') as f: dic = [] for line in f.readlines(): line = line.strip('\n') # 去掉换行符\n b = line.split(' ') # 将每一行以空格为分隔符转换成列表 dic.append(b[0]) # print(dic) # 第四步:title分词处理 neilian = [] for d in dic: d = d.split(',') # print(d[0]) # 简单粗暴处理标题去掉无意义标点符号和词 d[0] = d[0].replace('?', '') d[0] = d[0].replace('?', '') d[0] = d[0].replace(',', '') d[0] = d[0].replace(',', '') d[0] = d[0].replace('的', '') d[0] = d[0].replace('什么', '') d[0] = d[0].replace('如何', '') d[0] = d[0].replace('怎么', '') d[0] = d[0].replace('意思', '') d[0] = d[0].replace('原因', '') d[0] = d[0].replace('比较', '') title_seq_list = jieba.cut_for_search(d[0]) # print(list(title_seq_list)) tt = [] for t in title_seq_list: if (len(t) > 2): tt.append(t) # 去掉重复 lu = list(set(tt)) lu.sort(key=tt.index) for l in lu: try: ll = l + "," + d[1] neilian.append(ll) except: print('数据处理失败') # 第五步,写入内链txt文件 with open('./小黄鸡内链.txt', 'w', encoding='utf-8') as fp: for r in neilian: fp.write(r + '\n')