Package cz.vutbr.fit.layout.text.chunks
Class PresentationBasedChunksSource
- java.lang.Object
-
- cz.vutbr.fit.layout.text.chunks.ChunksSource
-
- cz.vutbr.fit.layout.text.chunks.PresentationBasedChunksSource
-
public class PresentationBasedChunksSource extends ChunksSource
A chunk source that follows some presentation patterns in order to improve the chunk extraction. Chunk extraction goes through the tree of areas. For every leaf area of the tree, the chunk extraction consists of the following phases:- Box extraction - extraction of a source boxes from the given source areas. We obtain a list of boxes that are later used as the source text for extracting the text chunks.
- Occurence extraction - location of the tag occurences in the source box text. We obtain a list of occurences.
- Chunk creation - creation of the chunks from the occurences. We obtain a list of chunks found in the box text.
addHint(Tag, PresentationHint)
method.- Author:
- burgetr
-
-
Constructor Summary
Constructors Constructor Description PresentationBasedChunksSource(Area root, TaggerConfig tagConfig, float minTagSupport, ChunksCache cache)
Creates a new source.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addHint(Tag tag, PresentationHint hint)
List<TextChunk>
getTextChunks()
Extracts a list of chunks from the source area tree.String
toString()
-
Methods inherited from class cz.vutbr.fit.layout.text.chunks.ChunksSource
getRoot
-
-
-
-
Constructor Detail
-
PresentationBasedChunksSource
public PresentationBasedChunksSource(Area root, TaggerConfig tagConfig, float minTagSupport, ChunksCache cache)
Creates a new source.- Parameters:
root
- the root area of the area treeminTagSupport
- minimal support of the tags for considering the areas for chunk extractioncache
- the cache of already extracted chunks for sharing the chunks among different sources ornull
when no cache should be used.
-
-
Method Detail
-
getTextChunks
public List<TextChunk> getTextChunks()
Description copied from class:ChunksSource
Extracts a list of chunks from the source area tree.- Specified by:
getTextChunks
in classChunksSource
- Returns:
-
addHint
public void addHint(Tag tag, PresentationHint hint)
-
-