465 lines
115 KiB
Plaintext
465 lines
115 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Lattice Programming\n",
|
||
|
"\n",
|
||
|
"## What is a lattice\n",
|
||
|
"\n",
|
||
|
"- Very generic math structure\n",
|
||
|
"- For practical purposes: just a **finite** set with a subset relation\n",
|
||
|
"- Paired with ``fixpoint operators``: functions that either\n",
|
||
|
" - Take a point and follow the lines upwards\n",
|
||
|
" - Stay in the same place\n",
|
||
|
"- Conditions above can be relaxed, but at the cost of complexity\n",
|
||
|
"\n",
|
||
|
"## Lattice programming\n",
|
||
|
"- Partial solutions\n",
|
||
|
"- Guaranteed termination\n",
|
||
|
"- Very modular / elegant\n",
|
||
|
"- A sane way to define recursive functions\n",
|
||
|
"- Not an implementation, just an underlying theory to build upon\n",
|
||
|
"\n",
|
||
|
"## Example Visualization\n",
|
||
|
"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 114,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from graphviz import Digraph\n",
|
||
|
"from IPython.display import Image\n",
|
||
|
"\n",
|
||
|
"def new_graph(name: str):\n",
|
||
|
" return Digraph(\n",
|
||
|
" name=name,\n",
|
||
|
" node_attr={\"shape\": \"circle\", \"style\": \"filled\"},\n",
|
||
|
" graph_attr={\"bgcolor\": \"transparent\", \"splines\": \"line\", \"rankdir\": \"BT\"},\n",
|
||
|
" edge_attr={\"arrowsize\": \"0.75\"},\n",
|
||
|
" strict=True,\n",
|
||
|
" )\n",
|
||
|
"\n",
|
||
|
"def render(dot: Digraph):\n",
|
||
|
" dot.render(filename=dot.name, format=\"png\", directory=\"./graphs\")\n",
|
||
|
" return f\"graphs/{dot.name}.png\"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 115,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwoAAAMMCAYAAADzYVA8AAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdd3hUZd7/8fdpkwAJNQoYFEGkCNJUUAIqNhCxIPbeXf1ZHn10LWtZdVV4FFHERUTsLq5iQVCUVQQBZUVBsQFiASnSW2g57ffHAUWlk+SezHxe15VrdTJz5jNxdu75nnPf9xdERERERERERERERERERERERERERERERER2nWU6gIiIpL0UUA/YHai1yU8VIG/DfapsuN9aYN2G25YDq4ElwFJgMTB/w09UTtlFRGQnqVAQEZGNCoH9gZbAfp7nNQEaBEGwexzHv44XnueF+fn5YeXKlePKlSsDUKlSJSsnJ8das2ZNXFJSEgMUFxezevVqq7i42A3D0N74eNu2A8/z5odhODMIgu+ALzf8TAWWldurFRGRrVKhICKSnaoA7YAi13U7AQcHQVAVoGbNmiVNmjRx6tev7+yxxx4UFhZSt25dCgoKqFGjBrm5uTv8ZMXFxSxbtoxFixYxd+5c5s2bx9y5c/npp5/86dOnW2vWrHEBUqnUL77vj4/jeDwwAfgcCErtVYuIyHZToSAikh1soC3Q1fO8E4IgaBvHsVNQUFBy0EEHea1atbIaN25M48aNqVatWrmHW7BgAd999x3Tpk1j8uTJ4ZQpU6Li4mLPcZx1lmWNDYJgBPAOMLPcw4mIZCkVCiIimcsDjrRt+zTbtk8KgqBGzZo1Sw477DDvkEMOsdq2bUvt2rVNZ9ysOI754YcfmDJlChMmTIgmTJgQrV271k2lUrNLSkr+DbwMfGo6p4hIJlOhICKSWSygk23b59q2fWoQBNWaNWvmd+3a1evYsSONGzc2nW+nhGHIlClTGDduHCNHjgzmz5/vplKpOSUlJc8DzwHTTGcUEck0KhRERDLDbsB5nudd6ft+w3333dfv3r2716VLFwoLC01nK3Vff/017777LiNGjPAXLVrkeZ430ff9fwJDSXZeEhGRXaRCQUSkYmtt2/YNwOk5OTmccMIJ7imnnELTpk1N5yoXURQxceJEhg4dGo0ePRrLslYHQTAA6AfMNZ1PRKQiU6EgIlLxWEBX13VvDIKgc8OGDf0LLrjAO/bYY3dqR6JMsXTpUl577TWee+45f8WKFTbwUhRFDwBfmM4mIlIRqVAQEalYjvI87wHf91u3atUqvOSSS5zDDjsMy9LH+Ua+7zNy5EgGDx7s//DDD57jOKPDMLwRmGw6m4hIRaKRRUSkYujseV4v3/fbdezYMbjmmmvcZs2amc6U1uI4ZsyYMTz66KPBzJkzHdu2Xw3D8DZguulsIiIVgQoFEZH01shxnIfCMDy+ffv2wTXXXOO2bNnSdKYKJY5j/vOf//Doo4/6s2fPtqMoehS4G3WBFhHZKhUKIiLpKR+4zbbt6/bcc09uueUWr6ioyHSmCi2KIl599VUefvjhYM2aNauDILgFeAIITWcTEUlHKhRERNJPd9d1B+Xk5BRcc8017umnn47jOKYzZYxVq1bx+OOP8+KLL0a2bX/u+/4FwJemc4mIpBuNPCIi6aOm4zj94jh+8Mgjj6z8xBNPOO3atcO2bdO5MkpOTg5FRUUcddRR1tSpU3dbtGjRFUBlYDwQGI4nIpI2dEVBRCQ9nOi67jM1atSoctddd3mdOnUynScrRFHEkCFD6Nu3bxhF0fe+758KTDWdS0QkHeiKgoiIWVVs234sjuMHTjzxxNRjjz3mNmrUyHSmrGFZFi1btqR79+721KlTqy1cuPCyOI6Lgf+aziYiYpoKBRERc1p5njcmNzf38N69e9uXXHKJlUqlTGfKSvn5+Zx44om267rOZ599drTruodGUTQSWGM6m4iIKZp6JCJixtm2bQ9u3bq188ADD7i777676TyywZdffsl1113nL1myZHEQBCcCk0xnEhExQVcURETKlwvcBzzUs2dP58EHH3Ty8/NNZ5JN1K5dmxNPPNH5+uuvK8+bN++iOI5XoqlIIpKFVCiIiJSffNd1hzuOc/bdd99tX3755ZZ2NEpPubm5dO/e3U6lUvZ///vfrrZtF8ZxPBKITGcTESkvKhRERMpHoed5Y/Pz89s+/fTTbseOHU3nkW2wLIu2bdtajRo1st5///3WlmUdEMfxMMA3nU1EpDxojYKISNlr6zjOu3vvvXe1gQMHerVr1zadR3bQlClTuPLKK4P169d/6ft+V2Ch6UwiImVN17xFRMpWkeM4Hx500EE1XnjhBRUJFVSbNm146aWX3N12262F53kfAYWmM4mIlDVdURARKTuHOY4zsqioKPXQQw85OTk5pvPILlqyZAkXX3xxMGvWrIVBEBwGzDSdSUSkrOiKgohI2TjWtu1RxxxzTE6/fv1UJGSIWrVq8fTTT7v77LPPbq7rTgAam84kIlJWVCiIiJS+I2zbfuOEE05we/XqZTuO9o3IJDVq1ODpp5/2mjRpUtN13bFAfdOZRETKgqYeiYiUroMdxxnduXPnnD59+tja/jRzFRcXc8EFFwQ//PDDPN/3DwHmmc4kIlKaVCiIiJSe1o7jjOvUqVOlvn37Oq7rms4jZWzp0qWce+65/vz583/cUCwsNZ1JRKS0qFAQESkde7qu+2mbNm1qPv74424qlTKdR8rJwoULOeOMM/xly5ZNCoLgCGC96UwiIqVBhYKIyK7L9zxvYmFh4b7/+te/vPz8fNN5pJx9//33nHXWWcH69etfD8PwdCA2nUlEZFdphZ2IyK5xXdd9Kz8/v82zzz7rFRQUmM4jBtSsWZMWLVrYb731VrM4jl3gA9OZRER2lQoFEZFd08txnDMHDx7s7rPPPqaziEF77rknBQUF1tixYzsBU4FppjOJiOwKFQoiIjvvJOCRu+66yz700ENNZ5E0sN9++7Fo0SKmT59+YhzHrwOLTGcSEdlZ2rdPRGTnNHMc58XTTz+dk046yXQWSSO33HKL1bRpU8/zvDeAPNN5RER2lhYzi4jsuFzP86Y0bdq00bPPPut6nmc6j6SZX375hZ49e/qrV68eEobh+abziIjsDE09EhHZQbZtP+i67rGDBg1ya9asaTqOpKG8vDyaNGniDB8+vBUwHfjKdCYRkR2lQkFEZMccHcdx/3vuucdu166d6SySxvbaay8WL14cT58+vWscxy8CK0xnEhHZEVqjICKy/aq7rvti165d4+OPP950FqkA/vrXv1r16tXLcV33WTTdV0QqGF1REBHZTrZt969cufLBjz/+uFOpUiXTcaQCcF2Xli1bOkOHDq0PzAY+N51JRGR7qVAQEdk+h8Vx/Mh9993ntGjRwnQWqUB23313Vq5cyddff310HMfPAKtMZxIR2R6aeiQism25nuc9c9hhh0VdunQxnUUqoKuvvtqqVatWynGcfqaziIhsL11REBHZtps8zztx4MCBTl6etsWXHed5Hg0bNnRGjBixH/Ah8JPhSCIi26QrCiIiW1fHcZy/XXbZZU7t2rVNZ5EKrFOnThQVFYWe5z0OuKbziIhsiwoFEZGtsG37gRo1arjnnXee6SiSAW666SYniqJGwMWms4iIbIsKBRGRLWsbx/HZN998s5ebm2s6i2SABg0acOaZZ9qu694PaB6biKQ1FQoiIlvguu69jRs3Do455hjTUSSDXH755biumw9cbTqLiMjWqFAQEdm8A4Mg6HLdddd5lqU+WVJ6qlevzvnnn+86jnMrUN10HhGRLVGhICKyGa7r9t5///2DoqIi01EkA11wwQXkJvPZrjWdRURkS1QoiIj8WbsgCI649tprPdNBJDPl5eVx4YUXuo7j3ADkm84jIrI5KhRERP7AcZwbmzRp4rdv3950FMlgZ555Jq7r5qIdkEQkTalQEBH5vfpRFJ180UUX6WqClKmqVavSs2dP13XdG1FfBRFJQyoURER+79patWqFXbp0MZ1DssC5555LFEV1gVNMZxER+SMVCiIiv6nkOM6l5557ruc4jukskgXq1atH586dI8/ztFWqiKQdFQoiIr85Dah80kknmc4hWeSUU05xfN/vADQ
|
||
|
"text/plain": [
|
||
|
"<IPython.core.display.Image object>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 115,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"from more_itertools import powerset, distinct_combinations\n",
|
||
|
"\n",
|
||
|
"graph = new_graph(\"graph1\")\n",
|
||
|
"\n",
|
||
|
"domain = {1, 2, 3, 4}\n",
|
||
|
"domain_subsets = tuple((str(i), set(subset)) for i, subset in enumerate(powerset(domain)))\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"for label, subset in domain_subsets:\n",
|
||
|
" graph.node(name=label, label=str(subset))\n",
|
||
|
"\n",
|
||
|
"for (labelA, subsetA), (labelB, subsetB) in distinct_combinations(domain_subsets, 2):\n",
|
||
|
" if len(subsetB) - len(subsetA) == 1 and subsetA.issubset(subsetB):\n",
|
||
|
" graph.edge(labelA, labelB)\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"Image(render(graph))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Example Fixpoint Operator\n",
|
||
|
"\n",
|
||
|
"- Computation can start at the TOP or BOTTOM and move downwards or upwards in the lattice\n",
|
||
|
"- An operator is just a function that takes and element and moves in a direction consistently\n",
|
||
|
"\n",
|
||
|
"## Problem\n",
|
||
|
"\n",
|
||
|
"- A diagnostic approach to parts of speech tagging (POS tagging)\n",
|
||
|
" - Diagnostics propose a tag, then possibly refute it\n",
|
||
|
"- POS: noun, verb, adverb, subject\n",
|
||
|
"### Lattice\n",
|
||
|
"- Domain: pairs (word, tag) where word is a word and tag is the word's pos\n",
|
||
|
"- Computation: start with the TOP, apply every operator in composition until a fixpoint is reached"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 116,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('quick', 'NOUN(SUBJECT)'), ('quick', 'VERB'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'), ('brown', 'NOUN(SUBJECT)'), ('brown', 'VERB'), ('fox', 'DET'), ('fox', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('fox', 'VERB'), ('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)'), ('jumps', 'VERB'), ('over', 'DET'), ('over', 'NOUN(OBJ)'), ('over', 'NOUN(SUBJECT)'), ('over', 'VERB'), ('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('lazy', 'DET'), ('lazy', 'NOUN(OBJ)'), ('lazy', 'NOUN(SUBJECT)'), ('lazy', 'VERB'), ('dog', 'DET'), ('dog', 'NOUN(OBJ)'), ('dog', 'NOUN(SUBJECT)'), ('dog', 'VERB'))\n",
|
||
|
"There are 36 elements in the domain\n",
|
||
|
"Which means there are (2^36 = 68719476736) possible subsets\n",
|
||
|
"Can't be visualized, but can easily be encoded with bitsets/bitflags\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"from itertools import product\n",
|
||
|
"\n",
|
||
|
"words = (\"the\", \"quick\", \"brown\", \"fox\", \"jumps\", \"over\", \"the\", \"lazy\", \"dog\")\n",
|
||
|
"tags = (\"DET\", \"NOUN(OBJ)\", \"NOUN(SUBJECT)\", \"VERB\")\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"domain = tuple(product(words, tags))\n",
|
||
|
"\n",
|
||
|
"print(domain)\n",
|
||
|
"print(f\"There are {len(domain)} elements in the domain\")\n",
|
||
|
"print(f\"Which means there are (2^{len(domain)} = {2**len(domain)}) possible subsets\")\n",
|
||
|
"print(\"Can't be visualized, but can easily be encoded with bitsets/bitflags\")\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Linguistic Diagnosis\n",
|
||
|
"\n",
|
||
|
"- Method stolen from classical linguistics ([uoa reference](https://ezpa.library.ualberta.ca/ezpAuthen.cgi?url=https://academic.oup.com/book/12008/chapter-abstract/161275349?redirectedFrom=fulltext)) ([non uoa, gated](https://academic.oup.com/book/12008/chapter/161275349?login=true))\n",
|
||
|
"- Rooted in generative grammar\n",
|
||
|
"- Don't want to misrepresent it, its a lot more sophisticated than what I'm presenting here.\n",
|
||
|
"- General principle is apply rules to rule out possible interpretations\n",
|
||
|
"- I'm not an expert of linguistic diagnosis, I just think it's cool\n",
|
||
|
"- Given a template sentence, decide whether it's true when instantiated\n",
|
||
|
"- E.g. \"The dog kicked the ball\" -> (\"Who kicked the ball?\", \"Dog\") is true, then dog is NOUN(SUBJECT) \n",
|
||
|
"- Oracle-driven\n",
|
||
|
" - In classical linguistics, the linguist is the oracle\n",
|
||
|
" - Oracle can be a trained NN\n",
|
||
|
" - Or a corpus/database\n",
|
||
|
" - ... doesn't really matter"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 117,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'))\n",
|
||
|
"{('quick', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('quick', 'DET'), ('the', 'NOUN(OBJ)'), ('brown', 'DET'), ('the', 'DET')}\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Diagnostic functions, lattice operators\n",
|
||
|
"\n",
|
||
|
"LatticeElement = set[tuple[str, str]]\n",
|
||
|
"\n",
|
||
|
"def adjacent_right_tags(word: str, element: LatticeElement) -> tuple[str]:\n",
|
||
|
" i = words.index(word)\n",
|
||
|
" if i >= len(words)-1:\n",
|
||
|
" return tuple()\n",
|
||
|
" word = words[i+1]\n",
|
||
|
" return tuple(tag for oword, tag in element if word == oword)\n",
|
||
|
"\n",
|
||
|
"def jumps_is_verb(element: LatticeElement) -> LatticeElement:\n",
|
||
|
" return {\n",
|
||
|
" (word, tag)\n",
|
||
|
" for word, tag in element\n",
|
||
|
" if word != \"jumps\" or tag == \"VERB\"\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
"#print(jumps_is_verb({(\"the\", \"DET\"), (\"the\", \"VERB\"), (\"jumps\", \"NOUN(SUBJECT)\"), (\"jumps\", \"VERB\")}))\n",
|
||
|
"\n",
|
||
|
"# Don't consider anything a DET if it doesn't come before a NOUN(SUBJECT), NOUN(OBJECT)\n",
|
||
|
"def remove_det_nonoun(element: LatticeElement) -> LatticeElement:\n",
|
||
|
" return {\n",
|
||
|
" (word, tag) for word, tag in element\n",
|
||
|
" if tag != \"DET\" or {\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.intersection(adjacent_right_tags(word, element))\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
"# Can't have two nouns next to eachother (e.g. dog cat)\n",
|
||
|
"def remove_noun_nodoubles(element: LatticeElement) -> LatticeElement:\n",
|
||
|
" return {\n",
|
||
|
" (word, tag) for word, tag in element\n",
|
||
|
" if tag not in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or not ({\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.issuperset(x := adjacent_right_tags(word, element)) and len(x) != 0)\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"from itertools import chain\n",
|
||
|
"first_three = words[:3]\n",
|
||
|
"first_three_nouns_dets = tuple(chain.from_iterable(((word, \"DET\"), (word, \"NOUN(OBJ)\")) for word in first_three))\n",
|
||
|
"print(first_three_nouns_dets)\n",
|
||
|
"print(remove_noun_nodoubles({*first_three_nouns_dets, (\"fox\", \"NOUN(SUBJECT)\")}))\n",
|
||
|
"\n",
|
||
|
"# A noun always comes right before a verb\n",
|
||
|
"def noun_immediately_precedes_verb(element: LatticeElement) -> LatticeElement:\n",
|
||
|
" return {\n",
|
||
|
" (word, tag) for word, tag in element\n",
|
||
|
" if tag in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or (\"VERB\",) != adjacent_right_tags(word, element)\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
"#print(noun_immediately_precedes_verb({(\"fox\", \"DET\"), (\"fox\", \"NOUN(OBJ)\"), (\"jumps\", \"VERB\")}))\n",
|
||
|
"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 118,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"The lattice top element, every word has every tag:\n",
|
||
|
"Start size = 32\n",
|
||
|
"Calling with remove_noun_nodoubles\n",
|
||
|
"remove_noun_nodoubles removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Calling with remove_det_nonoun\n",
|
||
|
"remove_det_nonoun removed 1 elements\n",
|
||
|
"{('dog', 'DET')}\n",
|
||
|
"Calling with jumps_is_verb\n",
|
||
|
"jumps_is_verb removed 3 elements\n",
|
||
|
"{('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)')}\n",
|
||
|
"Iteration 0 size = 28\n",
|
||
|
"{('brown', 'DET'),\n",
|
||
|
" ('brown', 'NOUN(OBJ)'),\n",
|
||
|
" ('brown', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('brown', 'VERB'),\n",
|
||
|
" ('dog', 'NOUN(OBJ)'),\n",
|
||
|
" ('dog', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('dog', 'VERB'),\n",
|
||
|
" ('fox', 'DET'),\n",
|
||
|
" ('fox', 'NOUN(OBJ)'),\n",
|
||
|
" ('fox', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('fox', 'VERB'),\n",
|
||
|
" ('jumps', 'VERB'),\n",
|
||
|
" ('lazy', 'DET'),\n",
|
||
|
" ('lazy', 'NOUN(OBJ)'),\n",
|
||
|
" ('lazy', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('lazy', 'VERB'),\n",
|
||
|
" ('over', 'DET'),\n",
|
||
|
" ('over', 'NOUN(OBJ)'),\n",
|
||
|
" ('over', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('over', 'VERB'),\n",
|
||
|
" ('quick', 'DET'),\n",
|
||
|
" ('quick', 'NOUN(OBJ)'),\n",
|
||
|
" ('quick', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('quick', 'VERB'),\n",
|
||
|
" ('the', 'DET'),\n",
|
||
|
" ('the', 'NOUN(OBJ)'),\n",
|
||
|
" ('the', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('the', 'VERB')}\n",
|
||
|
"Calling with remove_noun_nodoubles\n",
|
||
|
"remove_noun_nodoubles removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Calling with remove_det_nonoun\n",
|
||
|
"remove_det_nonoun removed 1 elements\n",
|
||
|
"{('fox', 'DET')}\n",
|
||
|
"Calling with jumps_is_verb\n",
|
||
|
"jumps_is_verb removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Iteration 1 size = 27\n",
|
||
|
"{('brown', 'DET'),\n",
|
||
|
" ('brown', 'NOUN(OBJ)'),\n",
|
||
|
" ('brown', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('brown', 'VERB'),\n",
|
||
|
" ('dog', 'NOUN(OBJ)'),\n",
|
||
|
" ('dog', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('dog', 'VERB'),\n",
|
||
|
" ('fox', 'NOUN(OBJ)'),\n",
|
||
|
" ('fox', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('fox', 'VERB'),\n",
|
||
|
" ('jumps', 'VERB'),\n",
|
||
|
" ('lazy', 'DET'),\n",
|
||
|
" ('lazy', 'NOUN(OBJ)'),\n",
|
||
|
" ('lazy', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('lazy', 'VERB'),\n",
|
||
|
" ('over', 'DET'),\n",
|
||
|
" ('over', 'NOUN(OBJ)'),\n",
|
||
|
" ('over', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('over', 'VERB'),\n",
|
||
|
" ('quick', 'DET'),\n",
|
||
|
" ('quick', 'NOUN(OBJ)'),\n",
|
||
|
" ('quick', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('quick', 'VERB'),\n",
|
||
|
" ('the', 'DET'),\n",
|
||
|
" ('the', 'NOUN(OBJ)'),\n",
|
||
|
" ('the', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('the', 'VERB')}\n",
|
||
|
"Calling with remove_noun_nodoubles\n",
|
||
|
"remove_noun_nodoubles removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Calling with remove_det_nonoun\n",
|
||
|
"remove_det_nonoun removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Calling with jumps_is_verb\n",
|
||
|
"jumps_is_verb removed 0 elements\n",
|
||
|
"set()\n",
|
||
|
"Reached Fixedpoint\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"{('brown', 'DET'),\n",
|
||
|
" ('brown', 'NOUN(OBJ)'),\n",
|
||
|
" ('brown', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('brown', 'VERB'),\n",
|
||
|
" ('dog', 'NOUN(OBJ)'),\n",
|
||
|
" ('dog', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('dog', 'VERB'),\n",
|
||
|
" ('fox', 'NOUN(OBJ)'),\n",
|
||
|
" ('fox', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('fox', 'VERB'),\n",
|
||
|
" ('jumps', 'VERB'),\n",
|
||
|
" ('lazy', 'DET'),\n",
|
||
|
" ('lazy', 'NOUN(OBJ)'),\n",
|
||
|
" ('lazy', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('lazy', 'VERB'),\n",
|
||
|
" ('over', 'DET'),\n",
|
||
|
" ('over', 'NOUN(OBJ)'),\n",
|
||
|
" ('over', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('over', 'VERB'),\n",
|
||
|
" ('quick', 'DET'),\n",
|
||
|
" ('quick', 'NOUN(OBJ)'),\n",
|
||
|
" ('quick', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('quick', 'VERB'),\n",
|
||
|
" ('the', 'DET'),\n",
|
||
|
" ('the', 'NOUN(OBJ)'),\n",
|
||
|
" ('the', 'NOUN(SUBJECT)'),\n",
|
||
|
" ('the', 'VERB')}"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 118,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Every word needs at least one part of speech\n",
|
||
|
"def consistent(element: LatticeElement) -> bool:\n",
|
||
|
" return len(set(word for word, _ in element)) == len(set(words))\n",
|
||
|
"\n",
|
||
|
"print(\"The lattice top element, every word has every tag:\")\n",
|
||
|
"assert consistent(domain)\n",
|
||
|
"assert not consistent({(\"jumps\", \"VERB\")})\n",
|
||
|
"\n",
|
||
|
"# Every word has exactly one tag\n",
|
||
|
"def model(element: LatticeElement) -> bool:\n",
|
||
|
" return consistent(element) and len(element) == len(set(words))\n",
|
||
|
"\n",
|
||
|
"assert not model(domain)\n",
|
||
|
"assert model({(word, \"DET\") for word in words})\n",
|
||
|
"\n",
|
||
|
"operators = [jumps_is_verb, remove_det_nonoun, remove_noun_nodoubles, noun_immediately_precedes_verb]\n",
|
||
|
"\n",
|
||
|
"def with_info(operator, x: LatticeElement) -> LatticeElement:\n",
|
||
|
" print(f\"Calling with {operator.__name__}\")\n",
|
||
|
" result = operator(x)\n",
|
||
|
" print(f\"{operator.__name__} removed {len(x) - len(result)} elements\")\n",
|
||
|
" print(x.difference(result))\n",
|
||
|
" return result\n",
|
||
|
"\n",
|
||
|
"def compose(f1, f2):\n",
|
||
|
" return lambda x: f1(f2(x))\n",
|
||
|
"\n",
|
||
|
"from functools import reduce, partial\n",
|
||
|
"from pprint import pprint\n",
|
||
|
"all_operator = reduce(compose, map(partial(partial, with_info), operators))\n",
|
||
|
"\n",
|
||
|
"def fixpoint(op, start):\n",
|
||
|
" print(f\"Start size = {len(start)}\")\n",
|
||
|
" i = 0\n",
|
||
|
" while (succ := op(start)) != start:\n",
|
||
|
" print(f\"Iteration {i} size = {len(succ)}\")\n",
|
||
|
" pprint(succ)\n",
|
||
|
" start = succ\n",
|
||
|
" i += 1\n",
|
||
|
" print(\"Reached Fixedpoint\")\n",
|
||
|
" return start\n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
"fixpoint(all_operator, set(domain))\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Exercises\n",
|
||
|
"1. Build a sentence for which the above operators produce inconsistent results\n",
|
||
|
"1. Construct more operators that complete the sentence\n",
|
||
|
"1. Rebuild the code so that it starts with the empty set and adds the POS it refutes"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Further Ideas\n",
|
||
|
"## A fuzzier approach\n",
|
||
|
"\n",
|
||
|
"- We're expecting a lot from a our diagnostics to have to \"entirely refute\" a POS, a graded approach would be better\n",
|
||
|
"- That's possible!\n",
|
||
|
"- Now we have triples (word, tag, confidence) and give the diagnostics the ability to lower a confidence level\n",
|
||
|
"- But this is no longer a simple subset relation, so it needs more general-purpose lattice theory."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.10.6"
|
||
|
},
|
||
|
"orig_nbformat": 4,
|
||
|
"vscode": {
|
||
|
"interpreter": {
|
||
|
"hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 2
|
||
|
}
|