-
【说话处理惩罚与Python】5.3应用Python字典映射词及其属性
添加时间:2013-5-26 点击量:字典数据类型(其他编程说话可能称为接洽关系数组或者哈希数组)
索引链表VS字典(略)
Python字典
#初始化一个空字典
pos={}
#字典的一些其他用法pos.keys0,pos.values(),pos.items()
#定义一个非空字典
>>>pos= {colorless:ADJ, ideas: N, sleep: V, furiously: ADV}
>>>pos= dict(colorless=ADJ,ideas=N, sleep=V, furiously=ADV)
凡是会应用第一个办法。须要重视的是,一个字典的键是不克不及批改的。
默认字典
我们可以应用默认字典,如许当接见一个不存在的键时,会付与默认值,而不是返回错误信息。
设置默认数据类型:
>>>frequency = nltk.defaultdict(int)
>>>frequency[colorless] = 4
>>>frequency[ideas]
0
>>>pos= nltk.defaultdict(list)
>>>pos[sleep]= [N, V]
>>>pos[ideas]
[]
设置默认值:
>>>pos= nltk.defaultdict(lambda: N)
>>>pos[colorless]= ADJ
>>>pos[blog]?
N
>>>pos.items()
[(blog, N), (colorless, ADJ)]
递增的更新词典
#递增更新字典,按值排序
>>>counts = nltk.defaultdict(int)
>>> nltk.corpusimport brown
>>>for (word, tag) in brown.tagged_words(categories=news):
... counts[tag]+=1
...
>>>counts[N]
22226
>>>list(counts)
[FW, DET, WH, "", VBZ, VB+PPO, "", ), ADJ, PRO, , -, ...]
>>> operator import itemgetter
>>>sorted(counts.items(), key=itemgetter(1),reverse=True)
[(N, 22226),(P, 10845),(DET, 10648),(NP, 8336),(V, 7313), ...]
>>>[t for t, c in sorted(counts.items(), key=itemgetter(1),reverse=True)]
[N, P, DET, NP, V, ADJ, ,, ., CNJ, PRO, ADV, VD, ...]
#一般的堆集任务的实现和nltk.Index()供给的更简单的办法对比
>>>anagrams = nltk.defaultdict(list)
>>>for wordin words:
... key= .join(sorted(word))
... anagrams[key].append(word)
...
>>>anagrams[aeilnrt]
[entrail, latrine, ratline, reliant, retinal, trenail]
>>>anagrams = nltk.Index((.join(sorted(w)),w)for win words)
>>>anagrams[aeilnrt]
[entrail, latrine, ratline, reliant, retinal, trenail]
倒置词典
>>>pos2= nltk.Index((value, key) for (key, value) in pos.items())
>>>pos2[ADV]
[peacefully, furiously]
常用的办法与字典相干习惯用法的总结