Commit Graph

129 Commits (78bd82f8ef571e1f16758c44a46b8aab83d1eb09)

Author SHA1 Message Date
Roland Haeder 841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
12 years ago
Michael Peter Christen 5878c1d599 - refactoring of log to ConcurrentLog:
12 years ago
Michael Peter Christen e6f361f474 adding the canonical tag to crawl queues
12 years ago
Michael Peter Christen 16d1d744fa added url_file_name_s in default collection schema for the file name
12 years ago
orbiter 7de5b9cfa0 fix for http://bugs.yacy.net/view.php?id=233
12 years ago
Michael Peter Christen 25499eead5 - added a new field for the regular expression in crawl start
12 years ago
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
12 years ago
reger 3897bb4409 added (manual) urldb migration (link on: Index Administraton -> Federated Solr Index)
12 years ago
Michael Peter Christen 34f8786508 removed dependency of vocabulary navigation from Jena and it's
12 years ago
Michael Peter Christen d6b82840f8 added a feature to find similarities in documents.
12 years ago
Michael Peter Christen d88eb657fd Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
12 years ago
Michael Peter Christen 6905182d41 - fix for number of words log message
12 years ago
reger 722a447b0d - optimize code of augmented parsing to enhence document tags
12 years ago
orbiter 276dd6452b removed warnings
12 years ago
reger 87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
12 years ago
Michael Peter Christen ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
13 years ago
Michael Peter Christen 00c1c777fa refactoring
13 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
orbiter fc0f9543fe More SentenceReader cleanup
13 years ago
orbiter 78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
13 years ago
Michael Peter Christen ad09b786bf clean up parser data
13 years ago
Michael Peter Christen 786be7d175 better integration of RDFaParser
13 years ago
Michael Peter Christen 9264d8b4af removed old navigation practice using subject tags in favor of
13 years ago
Michael Peter Christen 16d8f33795 added objectlink generation to vocabulary generation and editor
13 years ago
Michael Peter Christen 61bb52d55c - using http://purl.org/dc/terms/references to refer from an
13 years ago
Michael Peter Christen 8b53771db2 changed behavior of navigation processing:
13 years ago
Michael Peter Christen 5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
13 years ago
cominch bbfc53b663 bugfix
13 years ago
cominch 65c5826d93 bugfix
13 years ago
Michael Peter Christen 9b4c699526 ehanced location search:
13 years ago
Roland 'Quix0r' Haeder a093ccf5eb Now used synchronization in all close() methods to make sure all objects
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen 8bee1472c9 there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen a58dc4a91f added autotagging to document condenser:
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
apfelmaennchen 564374d1fe - included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
13 years ago
orbiter 85a5487d6d YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
14 years ago
orbiter 49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
14 years ago
orbiter 2d4bb139d3 - added counting of links with noindex tag for solr index
14 years ago
orbiter 0c1b29f3c9 - applied many small performance hacks
14 years ago
orbiter e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting
14 years ago
orbiter f6077b3cc0 added more attributes for html parser and enhanced data structures
14 years ago
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
low012 11ea966f9e *) added SID file (Commodore 64) sound file parser
14 years ago
low012 3d95981f7d *) cleaning up the code a little bit
14 years ago
orbiter 4e2c14efbb fixed bugs in parser and ftp client
14 years ago
orbiter f0651e5f2f added image search to yacyinteractive.html
14 years ago
orbiter b769cce433 - added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
14 years ago
orbiter 10a9cb1971 simplified snippet computation process and separated the algorithm into two classes
15 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds!
15 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
15 years ago
orbiter 4cd5418963 removed finalize methods because of a hint in
15 years ago
orbiter f204076d25 removed usage of temporary files: causes too much IO
15 years ago
orbiter 6c093d6aed - enhanced domain navigator computation
15 years ago
orbiter 54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
15 years ago
orbiter 56e0d9bd01 - testings with image parser
15 years ago
orbiter 82f57f79e5 more PMD enhancements
15 years ago
orbiter dff4f95c78 some patches to get the torrent parser working
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter 969123385b added json and rss output for image search
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter a8ce192f63 - shifted main classes to new package net.yacy
16 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
16 years ago