Commit Graph

79 Commits (4c242f9af9116a77361201c26d58ea08263c2885)

Author SHA1 Message Date
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen e6f361f474 adding the canonical tag to crawl queues
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
orbiter 7de5b9cfa0 fix for http://bugs.yacy.net/view.php?id=233
12 years ago
Michael Peter Christen 25499eead5 - added a new field for the regular expression in crawl start
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen d88eb657fd Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen 6905182d41 - fix for number of words log message
12 years ago
reger 722a447b0d - optimize code of augmented parsing to enhence document tags
12 years ago
orbiter 276dd6452b removed warnings
12 years ago
reger 87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen 00c1c777fa refactoring
13 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
orbiter fc0f9543fe More SentenceReader cleanup
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen ad09b786bf clean up parser data
13 years ago
Michael Peter Christen 786be7d175 better integration of RDFaParser
13 years ago
Michael Peter Christen 9264d8b4af removed old navigation practice using subject tags in favor of
13 years ago
Michael Peter Christen 16d8f33795 added objectlink generation to vocabulary generation and editor
13 years ago
Michael Peter Christen 61bb52d55c - using http://purl.org/dc/terms/references to refer from an
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
cominch bbfc53b663 bugfix
13 years ago
cominch 65c5826d93 bugfix
13 years ago
Michael Peter Christen 9b4c699526 ehanced location search:
13 years ago
Roland 'Quix0r' Haeder a093ccf5eb Now used synchronization in all close() methods to make sure all objects
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen 8bee1472c9 there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen a58dc4a91f added autotagging to document condenser:
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
apfelmaennchen 564374d1fe - included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
13 years ago
orbiter 85a5487d6d YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
14 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
14 years ago
orbiter 2d4bb139d3 - added counting of links with noindex tag for solr index
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago