组织ID: |
de.l3s.boilerpipe |
项目ID: |
boilerpipe |
版本: |
1.1.0 |
最后修改时间: |
2018-07-25 18:35:44 |
包类型: |
jar |
标题: |
Apache License 2.0 |
描述: |
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlsch眉tter. It is released under the Apache License 2.0.
The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlsch眉tter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.
|
相关URL: |
http://code.google.com/p/boilerpipe/ |
大小: |
89.87KB |
|
Maven引入代码: |
<dependency>
<groupId>de.l3s.boilerpipe</groupId>
<artifactId>boilerpipe</artifactId>
<version>1.1.0</version>
</dependency>
|
Gradle引入代码: |
de.l3s.boilerpipe:boilerpipe:1.1.0
|
下载Jar包: |
|
POM文件内容: |
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>de.l3s.boilerpipe</groupId>
<artifactId>boilerpipe</artifactId>
<packaging>jar</packaging>
<version>1.1.0</version>
<url>http://code.google.com/p/boilerpipe/</url>
<licenses>
<license>
<name>Apache License 2.0</name>
</license>
</licenses>
<name>Boilerpipe -- Boilerplate Removal and Fulltext Extraction from HTML pages</name>
<description>The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlsch眉tter. It is released under the Apache License 2.0.
The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlsch眉tter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.
</description>
<scm>
<connection>scm:svn:http://boilerpipe.googlecode.com/svn/trunk/</connection>
<url>http://code.google.com/p/boilerpipe/source/browse/</url>
</scm>
<developers>
<developer>
<name>Christian Kohlsch眉tter</name>
</developer>
</developers>
</project>
|
Jar包内容: |
META-INF/MANIFEST.MF
de.l3s.boilerpipe.BoilerpipeExtractor.class
de.l3s.boilerpipe.BoilerpipeFilter.class
de.l3s.boilerpipe.BoilerpipeInput.class
de.l3s.boilerpipe.BoilerpipeProcessingException.class
de.l3s.boilerpipe.conditions.TextBlockCondition.class
de.l3s.boilerpipe.document.TextBlock.class
de.l3s.boilerpipe.document.TextDocument.class
de.l3s.boilerpipe.document.TextDocumentStatistics.class
de.l3s.boilerpipe.estimators.SimpleEstimator.class
de.l3s.boilerpipe.extractors.ArticleExtractor.class
de.l3s.boilerpipe.extractors.ArticleSentencesExtractor.class
de.l3s.boilerpipe.extractors.CanolaExtractor$1.class
de.l3s.boilerpipe.extractors.CanolaExtractor.class
de.l3s.boilerpipe.extractors.CommonExtractors.class
de.l3s.boilerpipe.extractors.DefaultExtractor.class
de.l3s.boilerpipe.extractors.ExtractorBase.class
de.l3s.boilerpipe.extractors.KeepEverythingExtractor.class
de.l3s.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor.class
de.l3s.boilerpipe.extractors.LargestContentExtractor.class
de.l3s.boilerpipe.extractors.NumWordsRulesExtractor.class
de.l3s.boilerpipe.filters.english.DensityRulesClassifier.class
de.l3s.boilerpipe.filters.english.HeuristicFilterBase.class
de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter.class
de.l3s.boilerpipe.filters.english.KeepLargestFulltextBlockFilter.class
de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter.class
de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier.class
de.l3s.boilerpipe.filters.english.TerminatingBlocksFinder.class
de.l3s.boilerpipe.filters.heuristics.BlockProximityFusion.class
de.l3s.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier.class
de.l3s.boilerpipe.filters.heuristics.ExpandTitleToContentFilter.class
de.l3s.boilerpipe.filters.heuristics.KeepLargestBlockFilter.class
de.l3s.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor.class
de.l3s.boilerpipe.filters.simple.BoilerplateBlockFilter.class
de.l3s.boilerpipe.filters.simple.InvertedFilter.class
de.l3s.boilerpipe.filters.simple.LabelToBoilerplateFilter.class
de.l3s.boilerpipe.filters.simple.LabelToContentFilter.class
de.l3s.boilerpipe.filters.simple.MarkEverythingContentFilter.class
de.l3s.boilerpipe.filters.simple.MinClauseWordsFilter.class
de.l3s.boilerpipe.filters.simple.MinWordsFilter.class
de.l3s.boilerpipe.filters.simple.SplitParagraphBlocksFilter.class
de.l3s.boilerpipe.labels.ConditionalLabelAction.class
de.l3s.boilerpipe.labels.DefaultLabels.class
de.l3s.boilerpipe.labels.LabelAction.class
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler$Event.class
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.class
de.l3s.boilerpipe.sax.BoilerpipeHTMLParser.class
de.l3s.boilerpipe.sax.BoilerpipeSAXInput.class
de.l3s.boilerpipe.sax.CommonTagActions$1.class
de.l3s.boilerpipe.sax.CommonTagActions$2.class
de.l3s.boilerpipe.sax.CommonTagActions$3.class
de.l3s.boilerpipe.sax.CommonTagActions$4.class
de.l3s.boilerpipe.sax.CommonTagActions$5.class
de.l3s.boilerpipe.sax.CommonTagActions$6.class
de.l3s.boilerpipe.sax.CommonTagActions$BlockTagLabelAction.class
de.l3s.boilerpipe.sax.CommonTagActions$Chained.class
de.l3s.boilerpipe.sax.CommonTagActions$InlineTagLabelAction.class
de.l3s.boilerpipe.sax.CommonTagActions.class
de.l3s.boilerpipe.sax.DefaultTagActionMap.class
de.l3s.boilerpipe.sax.HTMLDocument.class
#内容未全部加载,请点击展开加载全部代码(NowJava.com)
|
依赖Jar: |
无
|