Commit Graph

1105 Commits (2280a7b276df243b687fb7a3ed10c939fd25c9c4)

Author SHA1 Message Date
Michael Peter Christen 22e1f68c0b solrj user authentication patch
13 years ago
Michael Peter Christen 09484955dc added new entry class for embed tags
13 years ago
Michael Peter Christen 62f2554a01 - fixed build problems (deprecated methods using httpclient 3.1)
13 years ago
Michael Peter Christen a6d60fc21f concurrency enhancement in ConfigurationSet
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen 5f5ed33ed8 patch for media search (audio, video apps)
13 years ago
Michael Peter Christen 7860c1df80 fix needed for new solrj library
13 years ago
Michael Peter Christen 0e13022147 - enhanced solr field documentation
13 years ago
Michael Peter Christen 19efbf1b0f - apply directDocByURL to NOLOAD Queue
13 years ago
Michael Peter Christen 659178942f - Redesigned crawler and parser to accept embedded links from the NOLOAD
13 years ago
Michael Peter Christen a3badd3205 changed search process for images: no more media snippet load process,
13 years ago
reger c1f6b4fb52 lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
13 years ago
Michael Peter Christen f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
13 years ago
Michael Peter Christen 14f67f217c refactoring of ContentDomain: now subclass of Classification
13 years ago
Michael Peter Christen 8a08c96a82 removed dependency from logging
13 years ago
Michael Peter Christen a1a5b015d8 refactoring: moved document Classification to cora package
13 years ago
Michael Peter Christen 33d1062c79 refactoring: the cache belongs to the crawler
13 years ago
Michael Peter Christen 4d5da75814 fix for parser problem if a <a>-tag is 'within' html tags with unclosed
13 years ago
Michael Peter Christen 91a86f0b06 fixed to network graph testing
13 years ago
Michael Peter Christen 7b5b9baee0 added citation rank to ranking profile
13 years ago
Michael Peter Christen 046f3a7e8d check if httpc has decompressed the release file and rename the file
13 years ago
Michael Christen 02e4dedff2 fix to url citation collection
13 years ago
Michael Christen e32055aa15 added stub classes for
13 years ago
Michael Christen ac5d124ee0 experimental implementation of a citation ranking as post-ranking
13 years ago
Michael Christen 8fc86fe397 added storage of full anchor link structure:
13 years ago
Lotus 0b3f39136e allow custom ppm lower than minimum button on /Crawler_p.html
13 years ago
Michael Peter Christen 532c7cf827 added physics experiment to the graph plotter. not active by default
13 years ago
Michael Peter Christen aba9b1bfa0 better names for elements of a linked graph
13 years ago
Michael Peter Christen 2fc8ecee36 ConcurrentLinkedQueue has a VERY long return time on the .size() method.
13 years ago
Michael Peter Christen 8aba045ba1 if a new pop-up page is set in config portal, then this page applies
13 years ago
Michael Peter Christen 8c06925984 animation of the web structure picture
13 years ago
Michael Peter Christen 898fa7c3f3 use tld heuristic to check if a domain is local or global
13 years ago
Michael Peter Christen 213c8d97f2 use less proccesses in process pool
13 years ago
Michael Peter Christen c639248c23 protection against strange answers from remote peers during search
13 years ago
Michael Peter Christen 36e4d82b27 changed ranking
13 years ago
Michael Peter Christen 096c17e7cd added test code
13 years ago
Michael Peter Christen 665626a51b catch OOM errors during scanning
13 years ago
Michael Peter Christen 1cd711d005 added classes for citation references (for new citation ranking)
13 years ago
Michael Peter Christen 33a405dab8 ipv6 bugfix
13 years ago
Michael Peter Christen c6c61be3f0 fix for http://bugs.yacy.net/view.php?id=148
13 years ago
Michael Peter Christen e0f1e7d904 added new citation reference data structure that shall be used for a
13 years ago
Michael Peter Christen e18a4f6b74 more tolerant merge iterator
13 years ago
Michael Peter Christen e101c2e0e2 added changes from copperdust (submitted by email):
13 years ago
Michael Peter Christen 8d63a5887c bugfixes
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
Michael Peter Christen 7e4e3fe5b6 free some memory after parsing html
13 years ago
Michael Peter Christen 4540174fe0 memory hacks
13 years ago
Michael Peter Christen b4409cc803 small redesign of blob column index and usage
13 years ago
Michael Peter Christen d5c1f2746e performance hack
13 years ago
Michael Peter Christen 803963aebd performance hack: better space grow in CharBuffer (speeds up html
13 years ago
Michael Peter Christen 8b0920b0b5 tried to fix the ipv6 problem as reported in bug
13 years ago
Michael Peter Christen e2f8f263e8 changed storage of search words: keep order
13 years ago
Michael Peter Christen ed39ef2890 changed generation of protocol information
13 years ago
Michael Peter Christen 0b67a0a5d8 added a column index for tables in blob files. This is heavily used
13 years ago
Michael Peter Christen 2e5cd6a1b2 fixed parser extension deny list generation and usage
13 years ago
Michael Peter Christen 8bee1472c9 there is no noindex, only nofollow in links
13 years ago
Michael Peter Christen 3cd6dcd352 do not add new solr fields as activated fields
13 years ago
Michael Peter Christen e3bb73c3d6 serialized some database access methods
13 years ago
Michael Peter Christen 7e728867e5 added a synchronization around iterations to prevent IO-deadlocking
13 years ago
Michael Peter Christen 355ecf330f reduced target file site to 64mb
13 years ago
Michael Peter Christen 10ae6d94a1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 2ea585d616 fix for host navigator
13 years ago
Michael Peter Christen 2f6dde92e2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen c560a582ac fix for single-word vocabulary lines
13 years ago
Michael Peter Christen 4c5edab1ec added option to have exception search result windows
13 years ago
Michael Peter Christen 046d7de95b Merge remote branch 'reger/master'
13 years ago
reger a95f645a61 Bugfix class repository.Loaddispatcher fixed download file limit of 10000
13 years ago
Michael Peter Christen ef78f22ee1 performance hack
13 years ago
Michael Peter Christen 41536eb4a2 performance hack
13 years ago
Michael Peter Christen f91487fc50 added delete-button for host navigation
13 years ago
Michael Peter Christen e8d24fd802 author navigator can be switched off
13 years ago
Michael Peter Christen 558ab7bd4e made the protocol navigator reversible
13 years ago
Michael Peter Christen 96cb75f1d4 made the filetype navigator be able to deselect the search constraint
13 years ago
Michael Peter Christen 1f4f60654a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
reger 32104360ce PDFParser - return at least first 3 pages of PDF
13 years ago
Michael Peter Christen ef5192f8c9 using the generic document parser for crawl starts instead of the html
13 years ago
Michael Peter Christen a02fdf8625 better error messages
13 years ago
Michael Peter Christen eadb58dd87 small enhancements in pdf parser
13 years ago
Michael Peter Christen c6ba44468e timeout = 5000 instead 3000
13 years ago
reger b616de5973 PDFParser - return at least first 3 pages of PDF
13 years ago
Lotus c73af39e54 refactoring of tray icon class,
13 years ago
Michael Peter Christen 4eff0e26f1 npe bugfix
13 years ago
low012 8776b84c10 *) small fix to make password change function of reconfigureYACY.sh work
13 years ago
Michael Peter Christen 1a0b6b3913 get more navigation details to search results
13 years ago
Michael Peter Christen 7f9b6b7a0c added switches to ConfigParser to accept/deny documents by their
13 years ago
Michael Peter Christen 4901cee3cc suppress auto-tagged subject entries when sending out or receiving
13 years ago
Michael Peter Christen 83009d86f7 added the vocabulary navigator. It can be very simply tested by
13 years ago
sixcooler 985b78cf89 correct 'avaiable()' to use max of young / eden
13 years ago
sixcooler 4da8746275 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
sixcooler c9aaa9e00a respect non-reserved Memory in GenerationMemoryStrategy
13 years ago
Michael Peter Christen 37f2d1b3e9 replaced Thread initialization with ExecutorService pool for delete
13 years ago
Michael Peter Christen a58dc4a91f added autotagging to document condenser:
13 years ago
Michael Peter Christen 0d6176804b emergency disabling of GenerationMemoryStrategy because of non-working
13 years ago
Lotus 411aab02e3 Windows installer now detects reliably whether YaCy runs. A file lock on
13 years ago
Michael Peter Christen 87f0210480 enriched log output to find NPE in HeapReader
13 years ago
Michael Peter Christen 987b412491 updated solr scheme: generic declaration of solr schemes
13 years ago
Michael Peter Christen 254adea51c small fixes
13 years ago
Michael Peter Christen 49be60a7c8 WorkflowProcess is forced to make small pauses if shortMemoryStatus is
13 years ago
Michael Peter Christen b7bb84c0bb set a limit to CharBuffer object size to fight against bad/too large
13 years ago
Michael Peter Christen c602eaaf46 enhanced search process
13 years ago
Michael Peter Christen 087f97d4c0 less noise if a browser cannot be opened
13 years ago
Michael Christen eff966f396 fix for search process (it was aborted too early during remote search)
13 years ago
Michael Christen e6d51363ee Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Marek Otahal a231d0eeb9 Run from Java the whole app YACY
13 years ago
Marek Otahal 72adbeae90 !Important: move from Hashtable to HashMap
13 years ago
Marek Otahal f40efb39af Blacklist loadList() remove duplicates by using Set
13 years ago
Marek Otahal f75b5e40e0 little fix in copy()
13 years ago
Marek Otahal 1dc5d9f0f3 make ConnectionInfo comparable and sort list of connections in Connections_p
13 years ago
Michael Christen fa8da7f89d vocabularies are now also used as source for a did-you-mean computation
13 years ago
Michael Christen eaec14ecc4 Dictionaries from words caches can now be used as autotagging vocabulary
13 years ago
Michael Peter Christen 91940fdf56 redesign of WordCache to be prepared to hold multiple
13 years ago
Michael Christen bd40a10230 added autotaggig stub .. only reading and parsing of vocabularies at
13 years ago
Michael Peter Christen 2ee8cbeb2c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 992dbdf4bb added noload statistic to servlets
13 years ago
Michael Christen eebc02f5c1 fix
13 years ago
Michael Christen 216a287a85 Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
13 years ago
stbrumm d18095dc48 Patch fuer Issue 0000102
13 years ago
stbrumm 9f1b1b4604 Type for Robinson-Mode/Private Perr added
13 years ago
Michael Christen 20962a4ed7 added metadata node stub for metadata from blobs
13 years ago
Michael Christen 575dbbaa93 enhancements in Blob retrieval: try to use less CPU resources by testing
13 years ago
Michael Christen 585a8f3c44 fixed a bug in search sequence (caused emtpy results)
13 years ago
Michael Christen 361146dd7a better error handling for file loader
13 years ago
Roland 'Quix0r' Haeder 6d4e08ed06 Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME
13 years ago
Roland 'Quix0r' Haeder fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Roland Haeder 319fd1f4aa A concurrent access can happen on the blacklist (with latest introduced blacklist check in media snippet computation)
13 years ago
Roland 'Quix0r' Haeder a3083d13bf Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
13 years ago
Michael Christen 52184a1170 fix for search process
13 years ago
Michael Christen 85bd4cc8bc better lookup for peer names
13 years ago
Michael Christen 20e3084bd4 redesign of fining of peers by ip: more leightweight method to read the
13 years ago
Michael Christen 0797b0de99 new handling of remote search processes: looking for seeds will now not
13 years ago
Michael Christen ee9aae5cc0 more about CreativeCommons license vocabulary
13 years ago
Michael Christen ecd74fe34f less dramatic upnp failures
13 years ago
Michael Christen c75e1a3125 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Christen 13f5b5f80d the component part in the YaCy Metadata is filled using the Dubling Core
13 years ago
Michael Peter Christen 8d2cbfb685 more vocabularies and more semantics for lod data structures
13 years ago
Michael Christen 9cd36b4c44 added vocabulary for geolocalization as used in georss
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen 66ab51f89d added rdf vocabulary
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
Michael Peter Christen 136b514f52 added a Triple Store based on Nodes that fit to the new storage classes.
13 years ago
Michael Peter Christen 613ab6a69d added BEncodedHeapBag and BEncodedHeapShard which are storage container
13 years ago
Michael Christen 6fecd0db88 one more performance hack to prevent costly md5 computation
13 years ago
Michael Christen e13441b069 better digest pool size (smaller by default but unlimited)
13 years ago
Michael Christen 1f4afb4dc0 performance hacks
13 years ago
Michael Christen 675d557e88 removed debug logging
13 years ago
Michael Christen e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
13 years ago
Michael Peter Christen 4243ace863 added phonetic classes
13 years ago
Michael Peter Christen 0bcef2d156 added feature as requested in
13 years ago
Michael Christen 204c29f010 small bugfixes for search result display and cache display
13 years ago
Michael Christen 17f962fceb translator updates:
13 years ago