Class PresentationBasedChunksSource


  • public class PresentationBasedChunksSource
    extends ChunksSource
    A chunk source that follows some presentation patterns in order to improve the chunk extraction. Chunk extraction goes through the tree of areas. For every leaf area of the tree, the chunk extraction consists of the following phases:
    1. Box extraction - extraction of a source boxes from the given source areas. We obtain a list of boxes that are later used as the source text for extracting the text chunks.
    2. Occurence extraction - location of the tag occurences in the source box text. We obtain a list of occurences.
    3. Chunk creation - creation of the chunks from the occurences. We obtain a list of chunks found in the box text.
    Finally, the chunks obtained from the individual areas are joined to a single list. The individual phases of the extraction may be influenced by different presentation hints registered using the addHint(Tag, PresentationHint) method.
    Author:
    burgetr
    • Constructor Detail

      • PresentationBasedChunksSource

        public PresentationBasedChunksSource​(Area root,
                                             TaggerConfig tagConfig,
                                             float minTagSupport,
                                             ChunksCache cache)
        Creates a new source.
        Parameters:
        root - the root area of the area tree
        minTagSupport - minimal support of the tags for considering the areas for chunk extraction
        cache - the cache of already extracted chunks for sharing the chunks among different sources or null when no cache should be used.