F.A.Q
Hand In Hand
Online Acmers
Problem Archive
Realtime Judge Status
Authors Ranklist
 
     C/C++/Java Exams     
ACM Steps
Go to Job
Contest LiveCast
ICPC@China
Best Coder beta
VIP | STD Contests
    DIY | Web-DIY beta
Author ID 
Password 
 Register new ID

Information Extraction

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 133    Accepted Submission(s): 51


Problem Description
Extracting the information from HTML document is a complex task. For example, journalists often need to extract some news from Web site (including the title, content, time, etc.), and then made it into a specific format. Because different HTML document has different structure and the artificial way to copy and paste is too tedious, now you need to write a program to store a specific output format, the structure of some HTML documents and the mapping from the structure to the output format. When it inputs a HTML document, according to the mapping, you need to output specific format text.
HTML documents are defined as follows£º
HTML
¡¡¡¡HTML stands for HyperText Markup Language.
¡¡¡¡HTML is a markup language.
¡¡¡¡A markup language is a set of markup tags.
¡¡¡¡The tags describe document content.
¡¡¡¡HTML documents consist of tags and texts.
Tags
¡¡¡¡HTML is using tags for its syntax.
¡¡¡¡A tag is composed with special characters: ¡®<¡¯, ¡®>¡¯ and ¡®/¡¯.
¡¡¡¡Tags usually come in pairs, the opening tag and the closing tag.
¡¡¡¡The opening tag starts with ¡°<¡± and the tagname. It usually ends with a ¡°>¡±.
¡¡¡¡The closing tag starts with ¡°</¡± and the same tagname as the corresponding opening tag. It ends with a ¡°>¡±.
¡¡¡¡There will not be any other angle brackets in the documents.
¡¡¡¡Tagnames are strings containing only lowercase letters.
¡¡¡¡Tags will contain no line break (¡®\n¡¯).
¡¡¡¡Except tags, anything occured in the document is considered as text content.
The length of tagname less than or equal 30
Elements
¡¡¡¡An element is everything from an opening tag to the matching closing tag (including the two tags).
¡¡¡¡The element content is everything between the opening and the closing tag.
¡¡¡¡Some elements may have no content. They¡¯re called empty elements, like <hr></hr>.
¡¡¡¡Empty elements can be closed in the opening tag, ending with a ¡°/>¡± instead of ¡°>¡±.
¡¡¡¡All elements are closed either with a closing tag or in the opening tag.
¡¡¡¡Elements can have attributes.
¡¡¡¡Elements can be nested (can contain other elements).
¡¡¡¡The <html> element is the container for all other elements, it will not have any attributes.
Attributes
¡¡¡¡Attributes provide additional information about an element.
¡¡¡¡Attributes are always specified in the opening tag after the tagname.
¡¡¡¡Tag name and attributes are separated by single space.
¡¡¡¡An element may have several attributes.
¡¡¡¡Attributes come in name="value" pairs like class="icpc".
¡¡¡¡There will not be any space around the '='.
¡¡¡¡All attribute names are in lowercase.
The value of the id attribute is unique and the length less than or equal 30.
A Simple Example
<html><body>
<h3 id="header" class="style1">this is a test</h3>
<div id="content" class="style2">
this is content<br/>
<pre>var x = 1111; </pre>
</div>
</body></html>
The structure of a HTML document is the HTML document, but only have id attribute and elements have no text content. A HTML document may have many structures, because the content of some elements may be removed.
A Simple Example
<html><body>
<h3 id="header"></h3>
<div id="content"></div>
</body></html>
The specific output format like a HTML document, but the container for all other elements is not necessarily the <html> element.
A Simple Example
<news>
<title></title>
<content></content>
<flag> others </flag>
</news>
The mapping from the structure to the output format is defined as follows£º
The value of a id attribute of a structure-A tagname of the ouput format
(It means the content of the element which the id belongs to should as the content of elements whose tagname is the tagname)
Two Simple Examples
header-title
content-content
 

Input
The first line of the input is an integer T (T<=15) representing the number of test cases.
Each test case is a specific output format (Length less than or equal 10000) in front.
Then input is an integer N (0<=N<=30) representing the number of the type of HTML document.
Each type is the structure of HTML document in front.
Then input is an integer M (0<=M<=30) representing the number of the mapping from the structure to the specific output format. Each of the next M lines is a mapping.
Each test case is a HTML document (Length less than or equal 10000) at the end.
 

Output
For each test case, first output a line ¡°Case #x:¡±, where x is the case number (starting from 1). If there exists the structure of the HTML doument, output specific format text, otherwise output ¡°Can't Identify¡±. If there exists more than one structure of the HTML doument, use the early input structure.
 

Sample Input
2 default title 1
2 header-title content-content
this is content
var x = 1111;
default title 1 1 header-title

xxxx

 

Sample Output
Case #1: this is a test this is content
var x = 1111;
Case #2: Can't Identify
 

Author
FZU
 

Source
 

Statistic | Submit | Discuss | Note
Hangzhou Dianzi University Online Judge 3.0
Copyright © 2005-2024 HDU ACM Team. All Rights Reserved.
Designer & Developer : Wang Rongtao LinLe GaoJie GanLu
Total 0.000000(s) query 1, Server time : 2024-11-22 23:35:26, Gzip enabled