lattice-programming/lattices.ipynb

458 lines
115 KiB
Plaintext
Raw Permalink Normal View History

2023-01-18 21:29:32 -07:00
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lattice Programming\n",
"\n",
"## What is a lattice\n",
"\n",
"- Very generic math structure\n",
"- For practical purposes: just a **finite** set with a subset relation\n",
"- Paired with ``fixpoint operators``: functions that either\n",
2023-01-19 10:53:15 -07:00
" - Follow the lines upwards\n",
" - Stop\n",
2023-01-18 21:29:32 -07:00
"- Conditions above can be relaxed, but at the cost of complexity\n",
"\n",
"## Lattice programming\n",
"- Partial solutions\n",
2023-01-19 10:53:15 -07:00
"- Guaranteed termination (through monotonicity)\n",
2023-01-18 21:29:32 -07:00
"- Very modular / elegant\n",
2023-01-19 10:53:15 -07:00
"- Sane recursive functions - no induction\n",
"- Theory, not implementation\n",
2023-01-18 21:29:32 -07:00
"\n",
"## Example Visualization\n",
"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 119,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [],
"source": [
"from graphviz import Digraph\n",
"from IPython.display import Image\n",
"\n",
"def new_graph(name: str):\n",
" return Digraph(\n",
" name=name,\n",
" node_attr={\"shape\": \"circle\", \"style\": \"filled\"},\n",
" graph_attr={\"bgcolor\": \"transparent\", \"splines\": \"line\", \"rankdir\": \"BT\"},\n",
" edge_attr={\"arrowsize\": \"0.75\"},\n",
" strict=True,\n",
" )\n",
"\n",
"def render(dot: Digraph):\n",
" dot.render(filename=dot.name, format=\"png\", directory=\"./graphs\")\n",
" return f\"graphs/{dot.name}.png\"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 120,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwoAAAMMCAYAAADzYVA8AAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdd3hUZd7/8fdpkwAJNQoYFEGkCNJUUAIqNhCxIPbeXf1ZHn10LWtZdVV4FFHERUTsLq5iQVCUVQQBZUVBsQFiASnSW2g57ffHAUWlk+SezHxe15VrdTJz5jNxdu75nnPf9xdERERERERERERERERERERERERERERERER2nWU6gIiIpL0UUA/YHai1yU8VIG/DfapsuN9aYN2G25YDq4ElwFJgMTB/w09UTtlFRGQnqVAQEZGNCoH9gZbAfp7nNQEaBEGwexzHv44XnueF+fn5YeXKlePKlSsDUKlSJSsnJ8das2ZNXFJSEgMUFxezevVqq7i42A3D0N74eNu2A8/z5odhODMIgu+ALzf8TAWWldurFRGRrVKhICKSnaoA7YAi13U7AQcHQVAVoGbNmiVNmjRx6tev7+yxxx4UFhZSt25dCgoKqFGjBrm5uTv8ZMXFxSxbtoxFixYxd+5c5s2bx9y5c/npp5/86dOnW2vWrHEBUqnUL77vj4/jeDwwAfgcCErtVYuIyHZToSAikh1soC3Q1fO8E4IgaBvHsVNQUFBy0EEHea1atbIaN25M48aNqVatWrmHW7BgAd999x3Tpk1j8uTJ4ZQpU6Li4mLPcZx1lmWNDYJgBPAOMLPcw4mIZCkVCiIimcsDjrRt+zTbtk8KgqBGzZo1Sw477DDvkEMOsdq2bUvt2rVNZ9ysOI754YcfmDJlChMmTIgmTJgQrV271k2lUrNLSkr+DbwMfGo6p4hIJlOhICKSWSygk23b59q2fWoQBNWaNWvmd+3a1evYsSONGzc2nW+nhGHIlClTGDduHCNHjgzmz5/vplKpOSUlJc8DzwHTTGcUEck0KhRERDLDbsB5nudd6ft+w3333dfv3r2716VLFwoLC01nK3Vff/017777LiNGjPAXLVrkeZ430ff9fwJDSXZeEhGRXaRCQUSkYmtt2/YNwOk5OTmccMIJ7imnnELTpk1N5yoXURQxceJEhg4dGo0ePRrLslYHQTAA6AfMNZ1PRKQiU6EgIlLxWEBX13VvDIKgc8OGDf0LLrjAO/bYY3dqR6JMsXTpUl577TWee+45f8WKFTbwUhRFDwBfmM4mIlIRqVAQEalYjvI87wHf91u3atUqvOSSS5zDDjsMy9LH+Ua+7zNy5EgGDx7s//DDD57jOKPDMLwRmGw6m4hIRaKRRUSkYujseV4v3/fbdezYMbjmmmvcZs2amc6U1uI4ZsyYMTz66KPBzJkzHdu2Xw3D8DZguulsIiIVgQoFEZH01shxnIfCMDy+ffv2wTXXXOO2bNnSdKYKJY5j/vOf//Doo4/6s2fPtqMoehS4G3WBFhHZKhUKIiLpKR+4zbbt6/bcc09uueUWr6ioyHSmCi2KIl599VUefvjhYM2aNauDILgFeAIITWcTEUlHKhRERNJPd9d1B+Xk5BRcc8017umnn47jOKYzZYxVq1bx+OOP8+KLL0a2bX/u+/4FwJemc4mIpBuNPCIi6aOm4zj94jh+8Mgjj6z8xBNPOO3atcO2bdO5MkpOTg5FRUUcddRR1tSpU3dbtGjRFUBlYDwQGI4nIpI2dEVBRCQ9nOi67jM1atSoctddd3mdOnUynScrRFHEkCFD6Nu3bxhF0fe+758KTDWdS0QkHeiKgoiIWVVs234sjuMHTjzxxNRjjz3mNmrUyHSmrGFZFi1btqR79+721KlTqy1cuPCyOI6Lgf+aziYiYpoKBRERc1p5njcmNzf38N69e9uXXHKJlUqlTGfKSvn5+Zx44om267rOZ599drTruodGUTQSWGM6m4iIKZp6JCJixtm2bQ9u3bq188ADD7i777676TyywZdffsl1113nL1myZHEQBCcCk0xnEhExQVcURETKlwvcBzzUs2dP58EHH3Ty8/NNZ5JN1K5dmxNPPNH5+uuvK8+bN++iOI5XoqlIIpKFVCiIiJSffNd1hzuOc/bdd99tX3755ZZ2NEpPubm5dO/e3U6lUvZ///vfrrZtF8ZxPBKITGcTESkvKhRERMpHoed5Y/Pz89s+/fTTbseOHU3nkW2wLIu2bdtajRo1st5///3WlmUdEMfxMMA3nU1EpDxojYKISNlr6zjOu3vvvXe1gQMHerVr1zadR3bQlClTuPLKK4P169d/6ft+V2Ch6UwiImVN17xFRMpWkeM4Hx500EE1XnjhBRUJFVSbNm146aWX3N12262F53kfAYWmM4mIlDVdURARKTuHOY4zsqioKPXQQw85OTk5pvPILlqyZAkXX3xxMGvWrIVBEBwGzDSdSUSkrOiKgohI2TjWtu1RxxxzTE6/fv1UJGSIWrVq8fTTT7v77LPPbq7rTgAam84kIlJWVCiIiJS+I2zbfuOEE05we/XqZTuO9o3IJDVq1ODpp5/2mjRpUtN13bFAfdOZRETKgqYeiYiUroMdxxnduXPnnD59+tja/jRzFRcXc8EFFwQ//PDDPN/3DwHmmc4kIlKaVCiIiJSe1o7jjOvUqVOlvn37Oq7rms4jZWzp0qWce+65/vz583/cUCwsNZ1JRKS0qFAQESkde7qu+2mbNm1qPv74424qlTKdR8rJwoULOeOMM/xly5ZNCoLgCGC96UwiIqVBhYKIyK7L9zxvYmFh4b7/+te/vPz8fNN5pJx9//33nHXWWcH69etfD8PwdCA2nUlEZFdphZ2IyK5xXdd9Kz8/v82zzz7rFRQUmM4jBtSsWZMWLVrYb731VrM4jl3gA9OZRER2lQoFEZFd08txnDMHDx7s7rPPPqaziEF77rknBQUF1tixYzsBU4FppjOJiOwKFQoiIjvvJOCRu+66yz700ENNZ5E0sN9++7Fo0SKmT59+YhzHrwOLTGcSEdlZ2rdPRGTnNHMc58XTTz+dk046yXQWSSO33HKL1bRpU8/zvDeAPNN5RER2lhYzi4jsuFzP86Y0bdq00bPPPut6nmc6j6SZX375hZ49e/qrV68eEobh+abziIjsDE09EhHZQbZtP+i67rGDBg1ya9asaTqOpKG8vDyaNGniDB8+vBUwHfjKdCYRkR2lQkFEZMccHcdx/3vuucdu166d6SySxvbaay8WL14cT58+vWscxy8CK0xnEhHZEVqjICKy/aq7rvti165d4+OPP950FqkA/vrXv1r16tXLcV33WTTdV0QqGF1REBHZTrZt969cufLBjz/+uFOpUiXTcaQCcF2Xli1bOkOHDq0PzAY+N51JRGR7qVAQEdk+h8Vx/Mh9993ntGjRwnQWqUB23313Vq5cyddff310HMfPAKtMZxIR2R6aeiQism25nuc9c9hhh0VdunQxnUUqoKuvvtqqVatWynGcfqaziIhsL11REBHZtps8zztx4MCBTl6etsWXHed5Hg0bNnRGjBixH/Ah8JPhSCIi26QrCiIiW1fHcZy/XXbZZU7t2rVNZ5EKrFOnThQVFYWe5z0OuKbziIhsiwoFEZGtsG37gRo1arjnnXee6SiSAW666SYniqJGwMWms4iIbIsKBRGRLWsbx/HZN998s5ebm2s6i2SABg0acOaZZ9qu694PaB6biKQ1FQoiIlvguu69jRs3Do455hjTUSSDXH755biumw9cbTqLiMjWqFAQEdm8A4Mg6HLdddd5lqU+WVJ6qlevzvnnn+86jnMrUN10HhGRLVGhICKyGa7r9t5///2DoqIi01EkA11wwQXkJvPZrjWdRURkS1QoiIj8WbsgCI649tprPdNBJDPl5eVx4YUXuo7j3ADkm84jIrI5KhRERP7AcZwbmzRp4rdv3950FMlgZ555Jq7r5qIdkEQkTalQEBH5vfpRFJ180UUX6WqClKmqVavSs2dP13XdG1FfBRFJQyoURER+79patWqFXbp0MZ1DssC5555LFEV1gVNMZxER+SMVCiIiv6nkOM6l5557ruc4jukskgXq1atH586dI8/ztFWqiKQdFQoiIr85Dah80kknmc4hWeSUU05xfN/vADQ
"text/plain": [
"<IPython.core.display.Image object>"
]
},
2023-01-18 21:30:01 -07:00
"execution_count": 120,
2023-01-18 21:29:32 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from more_itertools import powerset, distinct_combinations\n",
"\n",
"graph = new_graph(\"graph1\")\n",
"\n",
"domain = {1, 2, 3, 4}\n",
"domain_subsets = tuple((str(i), set(subset)) for i, subset in enumerate(powerset(domain)))\n",
"\n",
"\n",
"for label, subset in domain_subsets:\n",
" graph.node(name=label, label=str(subset))\n",
"\n",
"for (labelA, subsetA), (labelB, subsetB) in distinct_combinations(domain_subsets, 2):\n",
" if len(subsetB) - len(subsetA) == 1 and subsetA.issubset(subsetB):\n",
" graph.edge(labelA, labelB)\n",
"\n",
"\n",
"Image(render(graph))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Problem\n",
"\n",
2023-01-19 10:53:15 -07:00
"- A diagnosis approach to parts of speech tagging (POS tagging)\n",
2023-01-18 21:29:32 -07:00
" - Diagnostics propose a tag, then possibly refute it\n",
"- POS: noun, verb, adverb, subject\n",
"### Lattice\n",
"- Domain: pairs (word, tag) where word is a word and tag is the word's pos\n",
"- Computation: start with the TOP, apply every operator in composition until a fixpoint is reached"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 121,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('quick', 'NOUN(SUBJECT)'), ('quick', 'VERB'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'), ('brown', 'NOUN(SUBJECT)'), ('brown', 'VERB'), ('fox', 'DET'), ('fox', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('fox', 'VERB'), ('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)'), ('jumps', 'VERB'), ('over', 'DET'), ('over', 'NOUN(OBJ)'), ('over', 'NOUN(SUBJECT)'), ('over', 'VERB'), ('the', 'DET'), ('the', 'NOUN(OBJ)'), ('the', 'NOUN(SUBJECT)'), ('the', 'VERB'), ('lazy', 'DET'), ('lazy', 'NOUN(OBJ)'), ('lazy', 'NOUN(SUBJECT)'), ('lazy', 'VERB'), ('dog', 'DET'), ('dog', 'NOUN(OBJ)'), ('dog', 'NOUN(SUBJECT)'), ('dog', 'VERB'))\n",
"There are 36 elements in the domain\n",
"Which means there are (2^36 = 68719476736) possible subsets\n",
"Can't be visualized, but can easily be encoded with bitsets/bitflags\n"
]
}
],
"source": [
"from itertools import product\n",
"\n",
"words = (\"the\", \"quick\", \"brown\", \"fox\", \"jumps\", \"over\", \"the\", \"lazy\", \"dog\")\n",
"tags = (\"DET\", \"NOUN(OBJ)\", \"NOUN(SUBJECT)\", \"VERB\")\n",
"\n",
"\n",
"domain = tuple(product(words, tags))\n",
"\n",
"print(domain)\n",
"print(f\"There are {len(domain)} elements in the domain\")\n",
"print(f\"Which means there are (2^{len(domain)} = {2**len(domain)}) possible subsets\")\n",
"print(\"Can't be visualized, but can easily be encoded with bitsets/bitflags\")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linguistic Diagnosis\n",
"\n",
"- Method stolen from classical linguistics ([uoa reference](https://ezpa.library.ualberta.ca/ezpAuthen.cgi?url=https://academic.oup.com/book/12008/chapter-abstract/161275349?redirectedFrom=fulltext)) ([non uoa, gated](https://academic.oup.com/book/12008/chapter/161275349?login=true))\n",
"- Rooted in generative grammar\n",
2023-01-19 10:53:15 -07:00
"- Simplified here\n",
"- Operate on template sentences:\n",
2023-01-18 21:29:32 -07:00
"- E.g. \"The dog kicked the ball\" -> (\"Who kicked the ball?\", \"Dog\") is true, then dog is NOUN(SUBJECT) \n",
"- Oracle-driven\n",
" - In classical linguistics, the linguist is the oracle\n",
" - Oracle can be a trained NN\n",
" - Or a corpus/database\n",
" - ... doesn't really matter"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 122,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(('the', 'DET'), ('the', 'NOUN(OBJ)'), ('quick', 'DET'), ('quick', 'NOUN(OBJ)'), ('brown', 'DET'), ('brown', 'NOUN(OBJ)'))\n",
"{('quick', 'NOUN(OBJ)'), ('fox', 'NOUN(SUBJECT)'), ('quick', 'DET'), ('the', 'NOUN(OBJ)'), ('brown', 'DET'), ('the', 'DET')}\n"
]
}
],
"source": [
"# Diagnostic functions, lattice operators\n",
"\n",
"LatticeElement = set[tuple[str, str]]\n",
"\n",
"def adjacent_right_tags(word: str, element: LatticeElement) -> tuple[str]:\n",
" i = words.index(word)\n",
" if i >= len(words)-1:\n",
" return tuple()\n",
" word = words[i+1]\n",
" return tuple(tag for oword, tag in element if word == oword)\n",
"\n",
"def jumps_is_verb(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag)\n",
" for word, tag in element\n",
" if word != \"jumps\" or tag == \"VERB\"\n",
" }\n",
"\n",
"#print(jumps_is_verb({(\"the\", \"DET\"), (\"the\", \"VERB\"), (\"jumps\", \"NOUN(SUBJECT)\"), (\"jumps\", \"VERB\")}))\n",
"\n",
"# Don't consider anything a DET if it doesn't come before a NOUN(SUBJECT), NOUN(OBJECT)\n",
"def remove_det_nonoun(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag != \"DET\" or {\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.intersection(adjacent_right_tags(word, element))\n",
" }\n",
"\n",
"# Can't have two nouns next to eachother (e.g. dog cat)\n",
"def remove_noun_nodoubles(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag not in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or not ({\"NOUN(OBJ)\", \"NOUN(SUBJECT)\"}.issuperset(x := adjacent_right_tags(word, element)) and len(x) != 0)\n",
" }\n",
"\n",
"\n",
"from itertools import chain\n",
"first_three = words[:3]\n",
"first_three_nouns_dets = tuple(chain.from_iterable(((word, \"DET\"), (word, \"NOUN(OBJ)\")) for word in first_three))\n",
"print(first_three_nouns_dets)\n",
"print(remove_noun_nodoubles({*first_three_nouns_dets, (\"fox\", \"NOUN(SUBJECT)\")}))\n",
"\n",
"# A noun always comes right before a verb\n",
"def noun_immediately_precedes_verb(element: LatticeElement) -> LatticeElement:\n",
" return {\n",
" (word, tag) for word, tag in element\n",
" if tag in (\"NOUN(OBJ)\", \"NOUN(SUBJECT)\") or (\"VERB\",) != adjacent_right_tags(word, element)\n",
" }\n",
"\n",
"#print(noun_immediately_precedes_verb({(\"fox\", \"DET\"), (\"fox\", \"NOUN(OBJ)\"), (\"jumps\", \"VERB\")}))\n",
"\n"
]
},
{
"cell_type": "code",
2023-01-18 21:30:01 -07:00
"execution_count": 123,
2023-01-18 21:29:32 -07:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The lattice top element, every word has every tag:\n",
"Start size = 32\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
"remove_noun_nodoubles removed 0 elements\n",
"set()\n",
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 1 elements\n",
"{('dog', 'DET')}\n",
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 3 elements\n",
"{('jumps', 'DET'), ('jumps', 'NOUN(OBJ)'), ('jumps', 'NOUN(SUBJECT)')}\n",
"Iteration 0 size = 28\n",
"{('brown', 'DET'),\n",
" ('brown', 'NOUN(OBJ)'),\n",
" ('brown', 'NOUN(SUBJECT)'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'DET'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('fox', 'VERB'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'DET'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 2 elements\n",
"{('fox', 'VERB'), ('fox', 'DET')}\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
2023-01-18 21:30:01 -07:00
"remove_noun_nodoubles removed 2 elements\n",
"{('brown', 'NOUN(SUBJECT)'), ('brown', 'NOUN(OBJ)')}\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 1 elements\n",
2023-01-18 21:30:01 -07:00
"{('quick', 'DET')}\n",
2023-01-18 21:29:32 -07:00
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:30:01 -07:00
"Iteration 1 size = 23\n",
2023-01-18 21:29:32 -07:00
"{('brown', 'DET'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}\n",
2023-01-18 21:30:01 -07:00
"Calling with noun_immediately_precedes_verb\n",
"noun_immediately_precedes_verb removed 0 elements\n",
"set()\n",
2023-01-18 21:29:32 -07:00
"Calling with remove_noun_nodoubles\n",
"remove_noun_nodoubles removed 0 elements\n",
"set()\n",
"Calling with remove_det_nonoun\n",
"remove_det_nonoun removed 0 elements\n",
"set()\n",
"Calling with jumps_is_verb\n",
"jumps_is_verb removed 0 elements\n",
"set()\n",
"Reached Fixedpoint\n"
]
},
{
"data": {
"text/plain": [
"{('brown', 'DET'),\n",
" ('brown', 'VERB'),\n",
" ('dog', 'NOUN(OBJ)'),\n",
" ('dog', 'NOUN(SUBJECT)'),\n",
" ('dog', 'VERB'),\n",
" ('fox', 'NOUN(OBJ)'),\n",
" ('fox', 'NOUN(SUBJECT)'),\n",
" ('jumps', 'VERB'),\n",
" ('lazy', 'DET'),\n",
" ('lazy', 'NOUN(OBJ)'),\n",
" ('lazy', 'NOUN(SUBJECT)'),\n",
" ('lazy', 'VERB'),\n",
" ('over', 'DET'),\n",
" ('over', 'NOUN(OBJ)'),\n",
" ('over', 'NOUN(SUBJECT)'),\n",
" ('over', 'VERB'),\n",
" ('quick', 'NOUN(OBJ)'),\n",
" ('quick', 'NOUN(SUBJECT)'),\n",
" ('quick', 'VERB'),\n",
" ('the', 'DET'),\n",
" ('the', 'NOUN(OBJ)'),\n",
" ('the', 'NOUN(SUBJECT)'),\n",
" ('the', 'VERB')}"
]
},
2023-01-18 21:30:01 -07:00
"execution_count": 123,
2023-01-18 21:29:32 -07:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Every word needs at least one part of speech\n",
"def consistent(element: LatticeElement) -> bool:\n",
" return len(set(word for word, _ in element)) == len(set(words))\n",
"\n",
"print(\"The lattice top element, every word has every tag:\")\n",
"assert consistent(domain)\n",
"assert not consistent({(\"jumps\", \"VERB\")})\n",
"\n",
"# Every word has exactly one tag\n",
"def model(element: LatticeElement) -> bool:\n",
" return consistent(element) and len(element) == len(set(words))\n",
"\n",
"assert not model(domain)\n",
"assert model({(word, \"DET\") for word in words})\n",
"\n",
"operators = [jumps_is_verb, remove_det_nonoun, remove_noun_nodoubles, noun_immediately_precedes_verb]\n",
"\n",
"def with_info(operator, x: LatticeElement) -> LatticeElement:\n",
" print(f\"Calling with {operator.__name__}\")\n",
" result = operator(x)\n",
" print(f\"{operator.__name__} removed {len(x) - len(result)} elements\")\n",
" print(x.difference(result))\n",
" return result\n",
"\n",
"def compose(f1, f2):\n",
" return lambda x: f1(f2(x))\n",
"\n",
"from functools import reduce, partial\n",
"from pprint import pprint\n",
"all_operator = reduce(compose, map(partial(partial, with_info), operators))\n",
"\n",
"def fixpoint(op, start):\n",
" print(f\"Start size = {len(start)}\")\n",
" i = 0\n",
" while (succ := op(start)) != start:\n",
" print(f\"Iteration {i} size = {len(succ)}\")\n",
" pprint(succ)\n",
" start = succ\n",
" i += 1\n",
" print(\"Reached Fixedpoint\")\n",
" return start\n",
"\n",
"\n",
"fixpoint(all_operator, set(domain))\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"1. Build a sentence for which the above operators produce inconsistent results\n",
"1. Construct more operators that complete the sentence\n",
"1. Rebuild the code so that it starts with the empty set and adds the POS it refutes"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
2023-01-19 10:53:15 -07:00
"# A fuzzier approach\n",
2023-01-18 21:29:32 -07:00
"\n",
2023-01-19 10:53:15 -07:00
"- Binary deciding expects a lot from diagnostics\n",
"- Graded approach would be beter\n",
"- Modify to use triple (word: str, tag: str, confidence: float)\n"
2023-01-18 21:29:32 -07:00
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2023-01-19 10:54:17 -07:00
"version": "3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0]"
2023-01-18 21:29:32 -07:00
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "e7370f93d1d0cde622a1f8e1c04877d8463912d04d973331ad4851f04de6915a"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}