Commit Graph

49 Commits (e830c2945d5ca2b3e5e394eae123c965937dd9c5)

Author SHA1 Message Date
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
7 years ago
luccioman 6a4d51d8f9 Cleaned up some Javadoc warnings.
8 years ago
reger 6d54eb3d36 skip loading document on crawl start for YMark bookmarks
9 years ago
reger 52e3eb4ce8 harmonize/correct assignment to Ymarkmeta.mime
9 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
reger deb75a1dbe fix refactored size() -> filesize() in YMarkMetadata
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
10 years ago
Michael Peter Christen ad0da5f246 added new web page snapshot infrastructure which will lead to the
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
reger 727dfb5875 refactore URIMetadataNode to further unify interaction with index
11 years ago
Michael Peter Christen 8b44fcf0f4 added missing @Override annotation
11 years ago
Michael Peter Christen b08375da33 fix for bad/missing values of size_i
11 years ago
orbiter f6e441dd77 refactoring
11 years ago
Michael Peter Christen 6e59ca4ebf removed jena library and all code that depended on jena. When jena was
11 years ago
reger 6932aa4d7a use configured admin-username for api calls
11 years ago
orbiter 3cb6c7861f fixed shutdown authenticaton problem
11 years ago
Michael Peter Christen 2939b47986 removed non-working realm setting in http client (auth for localhost was
11 years ago
Michael Peter Christen 74466d731a use pre-compiled patterns in ymark
11 years ago
Michael Peter Christen 9bb7eab389 hacks to prevent storage of data longer than necessary during search and
11 years ago
Michael Peter Christen 1b4fa2947d - fixed a problem which ocurred when a document was not recognized with
11 years ago
Michael Peter Christen 5e31bad711 - the webgraph shall store all links which appear on a web page and not
11 years ago
Michael Peter Christen 765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
11 years ago
Michael Peter Christen cf12835f20 replaced the single-text description solr field with a multi-value
11 years ago
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
11 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen 57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
Michael Peter Christen 8f2d3ce2f9 reduced locking situation in crawler: shifted synchronized location and
12 years ago
Michael Peter Christen 25499eead5 - added a new field for the regular expression in crawl start
12 years ago
Michael Peter Christen 4eab3aae60 removed overhead by preventing generation of full search results when
12 years ago
cominch 05742b4562 remove old SMW importer which was part of the ymarks package
12 years ago
Michael Peter Christen 0fe8be7981 enhaced data structures for balancer and latency computation which
12 years ago
Michael Peter Christen ac9540dfb6 removed options for stopwords which are not used
12 years ago
Michael Peter Christen 43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of &amp; parts inside of the
12 years ago
Michael Peter Christen 2f536cb54d code cleanup: removed unised methods and made more methods and objects
12 years ago
orbiter 3190347814 added a synonyms_t field to solr and a process to read synonym files.
12 years ago
Michael Peter Christen 24f4ca4d85 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
apfelmaennchen 116f429e35 fix for java.lang.RuntimeException: TableColumnIndex not available...
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago