GHMMGeneralized Hidden Markov Model
Copyright 1988-2018, All rights reserved.
References in periodicals archive ?
We then apply all the resulting information to the trained GHMM (step 6) from the previous stage to analyze the optimal state of every block in a page.
We have made these new testing data available online in our website at for comparison purpose.Three types of experiments were performed to verify the effectiveness of the strategies we proposed for GHMM based Web IE, i.e., block-based GHMM, multiple-attributes incorporation, and layout structure based state transition sequence for GHMM.
Our proposed GHMM, however, is still comparable to Stalker and BWI in terms of extraction precision, and clearly outperforms pure-text term-based HMM.
5.4 GHMM with State Transition Sequence based on Layout Structure
In this experiment we verify the effectiveness of our proposed state transition sequence for GHMM based Web IE.
Three different approaches were applied for the testing data: (1) GHMM with the proposed state transition sequence; (2) GHMM with left-to-right and then top-to-bottom state transition sequence; (3) single-emission HMM (Here we only consider term as emission feature).
10 we can find that GHMM with the state transition sequence based on layout structure outperforms GHMM with left-to-right and then top-to-bottom transition sequence.
To investigate how multiple attributes of observation symbols under GHMMs contribute to Web information extraction, we design a task of extracting functional blocks on Web pages from
The experiment results of Web IE using GHMMs with different weight factor sets are shown in Figure 9 and Table 5.
Our proposed GHMMs utilize segmented units in a Web page called blocks and multiple attributes to extract Web information.
The tables below shows the various results obtained for the two developed systems of traditional GHMM recognition and hybrid (DTW/GHMM) applied to the different vocabularies:
The variation of performance raised about 2-10 % between the system GHMM and GHMM/DTW are observed for the registered test set.