lattice-programming/lattices.ipynb

466 lines
115 KiB
Plaintext
Raw Normal View History

2023-01-18 21:29:32 -07:00
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lattice Programming\n",
"\n",
"## What is a lattice\n",
"\n",
"- Very generic math structure\n",
"- For practical purposes: just a **finite** set with a subset relation\n",
"- Paired with ``fixpoint operators``: functions that either\n",
" - Take a point and follow the lines upwards\n",
" - Stay in the same place\n",
"- Conditions above can be relaxed, but at the cost of complexity\n",
"\n",
"## Lattice programming\n",
"- Partial solutions\n",
"- Guaranteed termination\n",
"- Very modular / elegant\n",
"- A sane way to define recursive functions\n",
"- Not an implementation, just an underlying theory to build upon\n",
"\n",
"## Example Visualization\n",
"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 119,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [],
"source": [
"from graphviz import Digraph\n",
"from IPython.display import Image\n",
"\n",
"def new_graph(name: str):\n",
" return Digraph(\n",
" name=name,\n",
" node_attr={\"shape\": \"circle\", \"style\": \"filled\"},\n",
" graph_attr={\"bgcolor\": \"transparent\", \"splines\": \"line\", \"rankdir\": \"BT\"},\n",
" edge_attr={\"arrowsize\": \"0.75\"},\n",
" strict=True,\n",
" )\n",
"\n",
"def render(dot: Digraph):\n",
" dot.render(filename=dot.name, format=\"png\", directory=\"./graphs\")\n",
" return f\"graphs/{dot.name}.png\"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 120,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwoAAAMMCAYAAADzYVA8AAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdd3hUZd7/8fdpkwAJNQoYFEGkCNJUUAIqNhCxIPbeXf1ZHn10LWtZdVV4FFHERUTsLq5iQVCUVQQBZUVBsQFiASnSW2g57ffHAUWlk+SezHxe15VrdTJz5jNxdu75nnPf9xdERERERERERERERERERERERERERERERER2nWU6gIiIpL0UUA/YHai1yU8VIG/DfapsuN9aYN2G25YDq4ElwFJgMTB/w09UTtlFRGQnqVAQEZGNCoH9gZbAfp7nNQEaBEGwexzHv44XnueF+fn5YeXKlePKlSsDUKlSJSsnJ8das2ZNXFJSEgMUFxezevVqq7i42A3D0N74eNu2A8/z5odhODMIgu+ALzf8TAWWldurFRGRrVKhICKSnaoA7YAi13U7AQcHQVAVoGbNmiVNmjRx6tev7+yxxx4UFhZSt25dCgoKqFGjBrm5uTv8ZMXFxSxbtoxFixYxd+5c5s2bx9y5c/npp5/86dOnW2vWrHEBUqnUL77vj4/jeDwwAfgcCErtVYuIyHZToSAikh1soC3Q1fO8E4IgaBvHsVNQUFBy0EEHea1atbIaN25M48aNqVatWrmHW7BgAd999x3Tpk1j8uTJ4ZQpU6Li4mLPcZx1lmWNDYJgBPAOMLPcw4mIZCkVCiIimcsDjrRt+zTbtk8KgqBGzZo1Sw477DDvkEMOsdq2bUvt2rVNZ9ysOI754YcfmDJlChMmTIgmTJgQrV271k2lUrNLSkr+DbwMfGo6p4hIJlOhICKSWSygk23b59q2fWoQBNWaNWvmd+3a1evYsSONGzc2nW+nhGHIlClTGDduHCNHjgzmz5/vplKpOSUlJc8DzwHTTGcUEck0KhRERDLDbsB5nudd6ft+w3333dfv3r2716VLFwoLC01nK3Vff/017777LiNGjPAXLVrkeZ430ff9fwJDSXZeEhGRXaRCQUSkYmtt2/YNwOk5OTmccMIJ7imnnELTpk1N5yoXURQxceJEhg4dGo0ePRrLslYHQTAA6AfMNZ1PRKQiU6EgIlLxWEBX13VvDIKgc8OGDf0LLrjAO/bYY3dqR6JMsXTpUl577TWee+45f8WKFTbwUhRFDwBfmM4mIlIRqVAQEalYjvI87wHf91u3atUqvOSSS5zDDjsMy9LH+Ua+7zNy5EgGDx7s//DDD57jOKPDMLwRmGw6m4hIRaKRRUSkYujseV4v3/fbdezYMbjmmmvcZs2amc6U1uI4ZsyYMTz66KPBzJkzHdu2Xw3D8DZguulsIiIVgQoFEZH01shxnIfCMDy+ffv2wTXXXOO2bNnSdKYKJY5j/vOf//Doo4/6s2fPtqMoehS4G3WBFhHZKhUKIiLpKR+4zbbt6/bcc09uueUWr6ioyHSmCi2KIl599VUefvjhYM2aNauDILgFeAIITWcTEUlHKhRERNJPd9d1B+Xk5BRcc8017umnn47jOKYzZYxVq1bx+OOP8+KLL0a2bX/u+/4FwJemc4mIpBuNPCIi6aOm4zj94jh+8Mgjj6z8xBNPOO3atcO2bdO5MkpOTg5FRUUcddRR1tSpU3dbtGjRFUBlYDwQGI4nIpI2dEVBRCQ9nOi67jM1atSoctddd3mdOnUynScrRFHEkCFD6Nu3bxhF0fe+758KTDWdS0QkHeiKgoiIWVVs234sjuMHTjzxxNRjjz3mNmrUyHSmrGFZFi1btqR79+721KlTqy1cuPCyOI6Lgf+aziYiYpoKBRERc1p5njcmNzf38N69e9uXXHKJlUqlTGfKSvn5+Zx44om267rOZ599drTruodGUTQSWGM6m4iIKZp6JCJixtm2bQ9u3bq188ADD7i777676TyywZdffsl1113nL1myZHEQBCcCk0xnEhExQVcURETKlwvcBzzUs2dP58EHH3Ty8/NNZ5JN1K5dmxNPPNH5+uuvK8+bN++iOI5XoqlIIpKFVCiIiJSffNd1hzuOc/bdd99tX3755ZZ2NEpPubm5dO/e3U6lUvZ///vfrrZtF8ZxPBKITGcTESkvKhRERMpHoed5Y/Pz89s+/fTTbseOHU3nkW2wLIu2bdtajRo1st5///3WlmUdEMfxMMA3nU1EpDxojYKISNlr6zjOu3vvvXe1gQMHerVr1zadR3bQlClTuPLKK4P169d/6ft+V2Ch6UwiImVN17xFRMpWkeM4Hx500EE1XnjhBRUJFVSbNm146aWX3N12262F53kfAYWmM4mIlDVdURARKTuHOY4zsqioKPXQQw85OTk5pvPILlqyZAkXX3xxMGvWrIVBEBwGzDSdSUSkrOiKgohI2TjWtu1RxxxzTE6/fv1UJGSIWrVq8fTTT7v77LPPbq7rTgAam84kIlJWVCiIiJS+I2zbfuOEE05we/XqZTuO9o3IJDVq1ODpp5/2mjRpUtN13bFAfdOZRETKgqYeiYiUroMdxxnduXPnnD59+tja/jRzFRcXc8EFFwQ//PDDPN/3DwHmmc4kIlKaVCiIiJSe1o7jjOvUqVOlvn37Oq7rms4jZWzp0qWce+65/vz583/cUCwsNZ1JRKS0qFAQESkde7qu+2mbNm1qPv74424qlTKdR8rJwoULOeOMM/xly5ZNCoLgCGC96UwiIqVBhYKIyK7L9zxvYmFh4b7/+te/vPz8fNN5pJx9//33nHXWWcH69etfD8PwdCA2nUlEZFdphZ2IyK5xXdd9Kz8/v82zzz7rFRQUmM4jBtSsWZMWLVrYb731VrM4jl3gA9OZRER2lQoFEZFd08txnDMHDx7s7rPPPqaziEF77rknBQUF1tixYzsBU4FppjOJiOwKFQoiIjvvJOCRu+66yz700ENNZ5E0sN9++7Fo0SKmT59+YhzHrwOLTGcSEdlZ2rdPRGTnNHMc58XTTz+dk046yXQWSSO33HKL1bRpU8/zvDeAPNN5RER2lhYzi4jsuFzP86Y0bdq00bPPPut6nmc6j6SZX375hZ49e/qrV68eEobh+abziIjsDE09EhHZQbZtP+i67rGDBg1ya9asaTqOpKG8vDyaNGniDB8+vBUwHfjKdCYRkR2lQkFEZMccHcdx/3vuucdu166d6SySxvbaay8WL14cT58+vWscxy8CK0xnEhHZEVqjICKy/aq7rvti165d4+OPP950FqkA/vrXv1r16tXLcV33WTTdV0QqGF1REBHZTrZt969cufLBjz/+uFOpUiXTcaQCcF2Xli1bOkOHDq0PzAY+N51JRGR7qVAQEdk+h8Vx/Mh9993ntGjRwnQWqUB23313Vq5cyddff310HMfPAKtMZxIR2R6aeiQism25nuc9c9hhh0VdunQxnUUqoKuvvtqqVatWynGcfqaziIhsL11REBHZtps8zztx4MCBTl6etsWXHed5Hg0bNnRGjBixH/Ah8JPhSCIi26QrCiIiW1fHcZy/XXbZZU7t2rVNZ5EKrFOnThQVFYWe5z0OuKbziIhsiwoFEZGtsG37gRo1arjnnXee6SiSAW666SYniqJGwMWms4iIbIsKBRGRLWsbx/HZN998s5ebm2s6i2SABg0acOaZZ9qu694PaB6biKQ1FQoiIlvguu69jRs3Do455hjTUSSDXH755biumw9cbTqLiMjWqFAQEdm8A4Mg6HLdddd5lqU+WVJ6qlevzvnnn+86jnMrUN10HhGRLVGhICKyGa7r9t5///2DoqIi01EkA11wwQXkJvPZrjWdRURkS1QoiIj8WbsgCI649tprPdNBJDPl5eVx4YUXuo7j3ADkm84jIrI5KhRERP7AcZwbmzRp4rdv3950FMlgZ555Jq7r5qIdkEQkTalQEBH5vfpRFJ180UUX6WqClKmqVavSs2dP13XdG1FfBRFJQyoURER+79patWqFXbp0MZ1DssC5555LFEV1gVNMZxER+SMVCiIiv6nkOM6l5557ruc4jukskgXq1atH586dI8/ztFWqiKQdFQoiIr85Dah80kknmc4hWeSUU05xfN/vADQ
"text/plain": [
"<IPython.core.display.Image object>"
]
},
2023-01-18 21:30:01 -07:00
"execution_count": 120,
2023-01-18 21:29:32 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from more_itertools import powerset, distinct_combinations\n",
"\n",
"graph = new_graph(\"graph1\")\n",
"\n",
"domain = {1, 2, 3, 4}\n",
"domain_subsets = tuple((str(i), set(subset)) for i, subset in enumerate(powerset(domain)))\n",
"\n",
"\n",
"for label, subset in domain_subsets:\n",
" graph.node(name=label, label=str(subset))\n",
"\n",
"for (labelA, subsetA), (labelB, subsetB) in distinct_combinations(domain_subsets, 2):\n",
" if len(subsetB) - len(subsetA) == 1 and subsetA.issubset(subsetB):\n",
" graph.edge(labelA, labelB)\n",
"\n",
"\n",
"Image(render(graph))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example Fixpoint Operator\n",
"\n",
"- Computation can start at the TOP or BOTTOM and move downwards or upwards in the lattice\n",
"- An operator is just a function that takes and element and moves in a direction consistently\n",
"\n",
"## Problem\n",
"\n",
"- A diagnostic approach to parts of speech tagging (POS tagging)\n",
" - Diagnostics propose a tag, then possibly refute it\n",
"- POS: noun, verb, adverb, subject\n",
"### Lattice\n",
"- Domain: pairs (word, tag) where word is a word and tag is the word's pos\n",
"- Computation: start with the TOP, apply every operator in composition until a fixpoint is reached"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 121,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('quick', 'NOUN(SUBJECT)'), ('quick', 'VERB'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'), ('brown', 'NOUN(SUBJECT)'), ('brown', 'VERB'), ('fox', 'DET'), ('fox', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('fox', 'VERB'), ('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)'), ('jumps', 'VERB'), ('over', 'DET'), ('over', 'NOUN(OBJ)'), ('over', 'NOUN(SUBJECT)'), ('over', 'VERB'), ('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('lazy', 'DET'), ('lazy', 'NOUN(OBJ)'), ('lazy', 'NOUN(SUBJECT)'), ('lazy', 'VERB'), ('dog', 'DET'), ('dog', 'NOUN(OBJ)'), ('dog', 'NOUN(SUBJECT)'), ('dog', 'VERB'))\n",
"There are 36 elements in the domain\n",
"Which means there are (2^36 = 68719476736) possible subsets\n",
"Can't be visualized, but can easily be encoded with bitsets/bitflags\n"
]
}
],
"source": [
"from itertools import product\n",
"\n",
"words = (\"the\", \"quick\", \"brown\", \"fox\", \"jumps\", \"over\", \"the\", \"lazy\", \"dog\")\n",
"tags = (\"DET\", \"NOUN(OBJ)\", \"NOUN(SUBJECT)\", \"VERB\")\n",
"\n",
"\n",
"domain = tuple(product(words, tags))\n",
"\n",
"print(domain)\n",
"print(f\"There are {len(domain)} elements in the domain\")\n",
"print(f\"Which means there are (2^{len(domain)} = {2**len(domain)}) possible subsets\")\n",
"print(\"Can't be visualized, but can easily be encoded with bitsets/bitflags\")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linguistic Diagnosis\n",
"\n",
"- Method stolen from classical linguistics ([uoa reference](https://ezpa.library.ualberta.ca/ezpAuthen.cgi?url=https://academic.oup.com/book/12008/chapter-abstract/161275349?redirectedFrom=fulltext)) ([non uoa, gated](https://academic.oup.com/book/12008/chapter/161275349?login=true))\n",
"- Rooted in generative grammar\n",
"- Don't want to misrepresent it, its a lot more sophisticated than what I'm presenting here.\n",
"- General principle is apply rules to rule out possible interpretations\n",
"- I'm not an expert of linguistic diagnosis, I just think it's cool\n",
"- Given a template sentence, decide whether it's true when instantiated\n",
"- E.g. \"The dog kicked the ball\" -> (\"Who kicked the ball?\", \"Dog\") is true, then dog is NOUN(SUBJECT) \n",
"- Oracle-driven\n",
" - In classical linguistics, the linguist is the oracle\n",
" - Oracle can be a trained NN\n",
" - Or a corpus/database\n",
" - ... doesn't really matter"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 122,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'))\n",
"{('quick', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('quick', 'DET'), ('the', 'NOUN(OBJ)'), ('brown', 'DET'), ('the', 'DET')}\n"
]
}
],
"source": [
"# Diagnostic functions, lattice operators\n",
"\n",
"LatticeElement = set[tuple[str, str]]\n",
"\n",
"def adjacent_right_tags(word: str, element: LatticeElement) -> tuple[str]:\n",
" i = words.index(word)\n",
" if i >= len(words)-1:\n",
" return tuple()\n",
" word = words[i+1]\n",
" return tuple(tag for oword, tag in element if word == oword)\n",
"\n",
"def jumps_is_verb(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag)\n",
" for word, tag in element\n",
" if word != \"jumps\" or tag == \"VERB\"\n",
" }\n",
"\n",
"#print(jumps_is_verb({(\"the\", \"DET\"), (\"the\", \"VERB\"), (\"jumps\", \"NOUN(SUBJECT)\"), (\"jumps\", \"VERB\")}))\n",
"\n",
"# Don't consider anything a DET if it doesn't come before a NOUN(SUBJECT), NOUN(OBJECT)\n",
"def remove_det_nonoun(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag != \"DET\" or {\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.intersection(adjacent_right_tags(word, element))\n",
" }\n",
"\n",
"# Can't have two nouns next to eachother (e.g. dog cat)\n",
"def remove_noun_nodoubles(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag not in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or not ({\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.issuperset(x := adjacent_right_tags(word, element)) and len(x) != 0)\n",
" }\n",
"\n",
"\n",
"from itertools import chain\n",
"first_three = words[:3]\n",
"first_three_nouns_dets = tuple(chain.from_iterable(((word, \"DET\"), (word, \"NOUN(OBJ)\")) for word in first_three))\n",
"print(first_three_nouns_dets)\n",
"print(remove_noun_nodoubles({*first_three_nouns_dets, (\"fox\", \"NOUN(SUBJECT)\")}))\n",
"\n",
"# A noun always comes right before a verb\n",
"def noun_immediately_precedes_verb(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or (\"VERB\",) != adjacent_right_tags(word, element)\n",
" }\n",
"\n",
"#print(noun_immediately_precedes_verb({(\"fox\", \"DET\"), (\"fox\", \"NOUN(OBJ)\"), (\"jumps\", \"VERB\")}))\n",
"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 123,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The lattice top element, every word has every tag:\n",
"Start size = 32\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
"remove_noun_nodoubles removed 0 elements\n",
"set()\n",
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 1 elements\n",
"{('dog', 'DET')}\n",
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 3 elements\n",
"{('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)')}\n",
"Iteration 0 size = 28\n",
"{('brown', 'DET'),\n",
" ('brown', 'NOUN(OBJ)'),\n",
" ('brown', 'NOUN(SUBJECT)'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'DET'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('fox', 'VERB'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'DET'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 2 elements\n",
"{('fox', 'VERB'), ('fox', 'DET')}\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
2023-01-18 21:30:01 -07:00
"remove_noun_nodoubles removed 2 elements\n",
"{('brown', 'NOUN(SUBJECT)'), ('brown', 'NOUN(OBJ)')}\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 1 elements\n",
2023-01-18 21:30:01 -07:00
"{('quick', 'DET')}\n",
2023-01-18 21:29:32 -07:00
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:30:01 -07:00
"Iteration 1 size = 23\n",
2023-01-18 21:29:32 -07:00
"{('brown', 'DET'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
"remove_noun_nodoubles removed 0 elements\n",
"set()\n",
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 0 elements\n",
"set()\n",
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 0 elements\n",
"set()\n",
"Reached Fixedpoint\n"
]
},
{
"data": {
"text/plain": [
"{('brown', 'DET'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}"
]
},
2023-01-18 21:30:01 -07:00
"execution_count": 123,
2023-01-18 21:29:32 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Every word needs at least one part of speech\n",
"def consistent(element: LatticeElement) -> bool:\n",
" return len(set(word for word, _ in element)) == len(set(words))\n",
"\n",
"print(\"The lattice top element, every word has every tag:\")\n",
"assert consistent(domain)\n",
"assert not consistent({(\"jumps\", \"VERB\")})\n",
"\n",
"# Every word has exactly one tag\n",
"def model(element: LatticeElement) -> bool:\n",
" return consistent(element) and len(element) == len(set(words))\n",
"\n",
"assert not model(domain)\n",
"assert model({(word, \"DET\") for word in words})\n",
"\n",
"operators = [jumps_is_verb, remove_det_nonoun, remove_noun_nodoubles, noun_immediately_precedes_verb]\n",
"\n",
"def with_info(operator, x: LatticeElement) -> LatticeElement:\n",
" print(f\"Calling with {operator.__name__}\")\n",
" result = operator(x)\n",
" print(f\"{operator.__name__} removed {len(x) - len(result)} elements\")\n",
" print(x.difference(result))\n",
" return result\n",
"\n",
"def compose(f1, f2):\n",
" return lambda x: f1(f2(x))\n",
"\n",
"from functools import reduce, partial\n",
"from pprint import pprint\n",
"all_operator = reduce(compose, map(partial(partial, with_info), operators))\n",
"\n",
"def fixpoint(op, start):\n",
" print(f\"Start size = {len(start)}\")\n",
" i = 0\n",
" while (succ := op(start)) != start:\n",
" print(f\"Iteration {i} size = {len(succ)}\")\n",
" pprint(succ)\n",
" start = succ\n",
" i += 1\n",
" print(\"Reached Fixedpoint\")\n",
" return start\n",
"\n",
"\n",
"fixpoint(all_operator, set(domain))\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"1. Build a sentence for which the above operators produce inconsistent results\n",
"1. Construct more operators that complete the sentence\n",
"1. Rebuild the code so that it starts with the empty set and adds the POS it refutes"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Further Ideas\n",
"## A fuzzier approach\n",
"\n",
"- We're expecting a lot from a our diagnostics to have to \"entirely refute\" a POS, a graded approach would be better\n",
"- That's possible!\n",
"- Now we have triples (word, tag, confidence) and give the diagnostics the ability to lower a confidence level\n",
"- But this is no longer a simple subset relation, so it needs more general-purpose lattice theory."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}