Class/Object

mimir.statistics

FuncDep

Related Docs: object FuncDep | package statistics

Permalink

class FuncDep extends Serializable

Annotations
@SerialVersionUID()
Linear Supertypes
Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FuncDep
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FuncDep(config: Map[String, PrimitiveValue] = Map())

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def best(pair: String, entity: Integer): ArrayList[((Integer, Integer), Float)]

    Permalink
  6. var blackList: Set[Int]

    Permalink
  7. val blackListThreshold: Double

    Permalink

    minimum percentage of non-null columns to be included in the calculations

  8. def buildEntities(db: Database, query: Operator, tableName: String): Unit

    Permalink
  9. def buildEntityGraph(g: DelegateTree[Integer, String], parent: Integer): Unit

    Permalink
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val combineEntityGraphs: Boolean

    Permalink

    if false then each entity will have it's own graph

  12. val computeEntityGraphs: Boolean

    Permalink

    ???

  13. def constructFDG(): Unit

    Permalink

    ConstructFDG constructs the functional dependency graph The steps are as follows:

    ConstructFDG constructs the functional dependency graph The steps are as follows:

    • Compare each column to every other column
      • Compre using something other than strings, create tuple instead of merging and merge only when two conflicting PrimitiveValue types
    • From this determine the strength between two columns
    • Strength is defined using the formula strength(column1,column2) = (# unique column1 - # unique column1 mode(column2) pairs) / (# unique (column1,column2) pairs - # unique column1 mode(column2) pairs)
    • This strength relationship is directional
      • The merging to avoid cycles could be done better
    • For a dependency to exist the strength must be >= threshold, and density column1 >= density column2 (Density requirement sometimes ommitted for tests)
    • Find the longest path in this graph for each column
    • Create a tree from this heuristic
  14. var countTable: IndexedSeq[Map[PrimitiveValue, Long]]

    Permalink
  15. def createViews(): ArrayList[String]

    Permalink
  16. var densityTable: IndexedSeq[Long]

    Permalink
  17. def depth(node: Integer, score: Integer): Integer

    Permalink
  18. var edgeTable: Buffer[(Int, Int, Double)]

    Permalink
  19. var endTime: Long

    Permalink
  20. var entityGraphList: ArrayList[DelegateTree[Integer, String]]

    Permalink
  21. var entityGraphString: ArrayList[DelegateTree[String, String]]

    Permalink
  22. var entityPairList: ArrayList[(Integer, Integer)]

    Permalink
  23. var entityPairMatrix: TreeMap[String, TreeMap[Integer, TreeMap[Integer, Float]]]

    Permalink
  24. def entityPairMatrixResult(): Unit

    Permalink
  25. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  27. var fdGraph: DirectedSparseMultigraph[Int, (Int, Int)]

    Permalink
  28. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  29. val flattenParentTable: Boolean

    Permalink

    used if wanting to flatten the parent table so it uses the child of root

  30. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  31. def getPairs(entity: Integer): ArrayList[String]

    Permalink
  32. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  33. def initializeTables(schema: Seq[(String, Type)], tName: String): Unit

    Permalink
  34. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  35. val logger: Logger

    Permalink
  36. def longestPath(g: DirectedSparseMultigraph[Int, (Int, Int)], predList: Collection[Int], currentPath: ArrayList[Int]): ArrayList[Int]

    Permalink
  37. def matchEnt(graphPairs: TreeMap[String, UndirectedSparseMultigraph[Integer, String]], parentTable: TreeMap[Integer, ArrayList[Integer]]): Unit

    Permalink
  38. def mergeEntities(): Unit

    Permalink
  39. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  40. var nodeTable: ArrayList[Integer]

    Permalink
  41. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  42. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  43. var onetoone: ListBuffer[(Int, Int)]

    Permalink
  44. val outputEntityGraphs: Boolean

    Permalink

    true if you want the entity graphs to be output

  45. def parentOfLongestPath(g: DirectedSparseMultigraph[Int, (Int, Int)], v: Int): Int

    Permalink
  46. var parentTable: Map[Int, Set[Int]]

    Permalink
  47. def preprocessFDG(db: Database, query: Operator): Unit

    Permalink

    Preprocess collects all the data needed to build the functional dependency graph and to create the entities.

    Preprocess collects all the data needed to build the functional dependency graph and to create the entities. It consumes the ResultSet and ResultIterator, this information is kept inside this program so it has a high upfront ram cost, this can be changed on implementation to have each column call the database and in parallel collect this information The data collected is a count of each unique value per column and the total number of nulls

  48. var sch: Seq[(String, Type)]

    Permalink
  49. def serialize(): Array[Byte]

    Permalink
  50. def serializeTo(db: Database, name: String): Unit

    Permalink
  51. def showEntity(g: DelegateTree[Integer, String]): Unit

    Permalink
  52. val showFDGraph: Boolean

    Permalink

    if you want the Functional Dependency Graph to be shown

  53. def showGraph(g: DirectedSparseMultigraph[Int, (Int, Int)]): Unit

    Permalink
  54. var singleEntityGraph: DelegateTree[Integer, String]

    Permalink
  55. var startTime: Long

    Permalink
  56. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  57. var table: IndexedSeq[Buffer[PrimitiveValue]]

    Permalink
  58. var tableName: String

    Permalink
  59. val threshhold: Double

    Permalink

    The threshold that determines if there is a functional dependency between two columns

  60. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  61. def updateEntityGraph(): Unit

    Permalink
  62. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  63. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  64. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  65. def writeToFile(outputFileName: String): Unit

    Permalink

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped