用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的\n)def hapax_legomena_ratio(text):""" (list of str) -> floatPrecondition:text is non-empty.Each str in text ends with \n andtext contains at least
来源:学生作业帮助网 编辑:六六作业网 时间:2024/12/04 01:44:15
用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的\n)def hapax_legomena_ratio(text):""" (list of str) -> floatPrecondition:text is non-empty.Each str in text ends with \n andtext contains at least
用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的\n)
def hapax_legomena_ratio(text):
""" (list of str) -> float
Precondition:text is non-empty.Each str in text ends with \n and
text contains at least one word.
Return the hapax legomena ratio for text.This ratio is the number of
words that occur exactly once divided by the total number of words.
>>> text = ['James Fennimore Cooper\n','Peter,Paul,and Mary\n',
'James Gosling\n']
>>> hapax_legomena_ratio(text)
0.7777777777777778
"""
t = [ ]
for string in text:
t.append(clean_up(string))
at_least_once = [ ] #规定只能创建两个list来运算 一个是至少出现一次的单词
at_least_twice = [ ] # 这个是至少出现两次的单词
然后应该是利用length做比吧 中间的body不会写.
没学过正则
用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的\n)def hapax_legomena_ratio(text):""" (list of str) -> floatPrecondition:text is non-empty.Each str in text ends with \n andtext contains at least
不清楚clean_up函数做了什么,整个函数都是新写的,你参考一下吧.
def hapax_legomena_ratio(text):
at_least_once = []
at_least_twice = []
total = 0
for s in text:
for word in s.strip().split():
word = word.strip('.,;')
total += 1
if word not in at_least_once:
at_least_once.append(word)
elif word not in at_least_twice:
at_least_twice.append(word)
return (1.0 * (len(at_least_once)-len(at_least_twice)))/total
代码很简单,前面是为了把字符串分割成单词,然后统计单词总数 (total += 1).
后面是核心部分,统计单词是出现了最少一次,还是最少两次.
if word not in at_least_once:
at_least_once.append(word)
elif word not in at_least_twice:
at_least_twice.append(word)
最后的 return 语句用来计算只出现一次的单词个数 (至少出现一次个数 - 至少出现两次个数 = 只出现一次个数)和总单词数的比率.