Commit Graph

5002 Commits (2d4bb139d3bdd5f176c03757f1c4dc2a19d6f2b3)

Author SHA1 Message Date
orbiter b77b8cac0c - enhanced html parser: recognized much more details in the content
14 years ago
low012 bc84d2bc9d *) fixed typo in stop script
14 years ago
apfelmaennchen b2281f0b7d YMark: intermediate work towards flexigrid support
14 years ago
low012 06d50fd801 *) fixed stupid bug (introduced in r7663 by myself) which caused wrong parsing of Wiki pages
14 years ago
apfelmaennchen 60412d2bb3 YMark:
14 years ago
orbiter 3d5104d357 - fixed a bug in crawl start with file name (npe in new url)
14 years ago
orbiter fd3baa9025 fix for http://bugs.yacy.net/view.php?id=24
14 years ago
low012 2e9694c9e9 *) removed recursion which hopefully prevents exception
14 years ago
apfelmaennchen a2e86daae9 YMark: more bug fixes
14 years ago
apfelmaennchen 62855f9567 YMark: code clean up and some small fixes
14 years ago
apfelmaennchen 667e912b19 YMark:
14 years ago
apfelmaennchen a0e4960a4d YMark:
14 years ago
orbiter 958ff4778e enhanced location search:
14 years ago
sixcooler 8d63f3b70f just cosmetics - keeping my baby clean :-)
14 years ago
orbiter e402622584 removed httpclient-3.1 (this was added with last commit which was a mistake)
14 years ago
orbiter 19fd13d3bc Added federated index storage to solr.
14 years ago
orbiter c17d102bd8 enhanced speed for OrderedScoreMap inc method and size comparisment in concurrent environments
14 years ago
orbiter b788182954 some enhancements to scoring speed
14 years ago
orbiter 01690eab86 fix for mediawiki importer and wikicode parser
14 years ago
orbiter c5352e6872 added new SearchResult class (to be used later)
14 years ago
orbiter 4c013d9088 more UTF8 getBytes() performance hacks
14 years ago
apfelmaennchen 78d6d6ca06 refactoring for ymarks
14 years ago
cominch 9ac02caf00 different initialization of empty variables in alternative constructor. This leads to wrong interpretation of user credentials, resulting in unnecessary "@" in front of host, and different urlhash values.
14 years ago
orbiter a47bdc405b better logging for robinson selection according to peer tag
14 years ago
orbiter cafcb1f9ed removed the DNS resolving for web structure computation from the indexing queue and placed it in a concurrent computation queue that does not block the crawler. Makes crawling faster and less DNS-speed-dependent
14 years ago
orbiter 57ce1fb491 reverted synchronization from SVN 7641
14 years ago
orbiter 17530ca7b5 fix for bug http://bugs.yacy.net/view.php?id=10
14 years ago
orbiter 7c8e764201 removed synchronization again...
14 years ago
orbiter 96c32e87b0 fixes to crawler and new user-agent crawl-delay handling
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
orbiter cb6f709a16 - enhancements in surrogate reading
14 years ago
low012 1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
14 years ago
orbiter 564184909a enhanced the surrogate parser: better reading of UTF-8 characters
14 years ago
orbiter 156cf02703 - added an index constraint 'has location' to the condenser
14 years ago
orbiter 41b8d7f655 fix for url normalization (no backpath resolving in post parameters)
14 years ago
orbiter 0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
14 years ago
orbiter 8412f8787d fix for http://bugs.yacy.net/view.php?id=8
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
f1ori efcf37a953 * show info in log, if robots.txt is rejected due to wrong mime-type
14 years ago
lotus cbf87fe72f write PID to yacy.running
14 years ago
low012 16cd919795 *) fixed Exceptions which caused 500 error when entering invalid URL mask or invalid prefer mask, invalid masks are ignored, error message is displayed on yacysearch.html (what about yacysearch.rss and yacysearch.json?)
14 years ago
low012 1a24917cea *) fixed NPE which occured when empty String was entered as search word
14 years ago
orbiter b1a8d0c020 enhancements to web cache and less strict caching rules
14 years ago
orbiter f3baaca920 - enhancements to DNS IP caching and crawler speed
14 years ago
low012 e7860b1239 *) <mode="Homer">D'oh!</Homer>
14 years ago
low012 82f1580a60 *) trying to fix ConcurrentModificationException
14 years ago
f1ori df71776929 * fix bug #7
14 years ago
low012 9f0286b380 *) fixed potential "java.lang.IllegalArgumentException: Illegal group reference" which occured if special characters which are also used as metacharacters in regular expression were used inside of <pre>...</pre> (see: http://veerasundar.com/blog/2010/01/java-lang-illegalargumentexception-illegal-group-reference-in-string-replaceall/)
14 years ago
orbiter 78d4c45d09 enhancement during search process: fast fail of search in case that all index feeder have terminated.
14 years ago
orbiter ba03ca8620 added more configuration options for search:
14 years ago
f1ori e0c7d490f9 * fix bug #6
14 years ago
orbiter a50f28e6e7 - fixed missing save operation for peer name change
14 years ago
orbiter 2b5f8585bf performance hack for Balancer and ip address parsing
14 years ago
orbiter b1d133b69f another anhancement to the ThreadDump function: better multiple dumps and filtering out of not interesting dump parts
14 years ago
orbiter a35d513bd8 fix for not-deleted .gap and .idx files
14 years ago
orbiter a6935e7dc8 fix for active dns resolving: do not resolve in case that the dns server is not available (offline mode)
14 years ago
orbiter 859c99886c fix for multiple thread dump
14 years ago
orbiter 61acf55da4 avoided using a synchronized(this) for the hash computation to prevent that the lock on the object is (accidently) stolen by another thread and replaced this synchronization using the protocol object. Made also the protocol object final.
14 years ago
orbiter c2a968c23f fix for bug in formatting in ThreadDump
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter 078ecacf61 avoid synchronization in DigestURI hash requests
14 years ago
orbiter 1989ebc24b removed more warnings
14 years ago
orbiter 0324de1467 removed debug line
14 years ago
orbiter 1aba7869bf patch for Windows: do not use the thread lock feature from previous commit if used on Windows
14 years ago
orbiter 0a11727374 added new feature for Thread dump:
14 years ago
orbiter b62b79675b removed type cast warnings
14 years ago
orbiter a07a1a8b1e removed type cast warnings
14 years ago
orbiter 8edaccfedf removed unused variables
14 years ago
orbiter e6c3507b17 disabled some of the previous changes (did not work in openjdk)
14 years ago
orbiter f9e5c21083 update to thread dump logs
14 years ago
orbiter 8f11d3a5bb redesigned the ScoreMap classes:
14 years ago
orbiter a564230c48 more enhancements against blocked threads occurred in seed age evaluation (blocks httpd in some cases)
14 years ago
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter 30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
14 years ago
lotus cb6d307bba adding extension for parser
14 years ago
orbiter 1214615185 fix for 'invisible entry', see http://forum.yacy-websuche.de/viewtopic.php?p=22133#p22133
14 years ago
orbiter 3820525464 more memory protection: auto-flush of caches in case of memory shortage
14 years ago
orbiter 7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
14 years ago
orbiter 96bb33ed9b added default size to StringBuffer in logger (and it is not possible to replace the StringBuffer with a StringBuilder...)
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
low012 3b40b98256 *) set SVN properties
14 years ago
orbiter 2af8e33773 better performance computing search targets with index abstracts
14 years ago
orbiter 619b561a4a enhanced secondary search: index abstracts decompression is now much faster and does not cause strong CPU load after several searches with more than one word
14 years ago
low012 bf27a72d53 *) set SVN properties
14 years ago
low012 b649ce2dd7 *) minor changes
14 years ago
orbiter 27ecdb5444 use less peers for remote search
14 years ago
orbiter 70a996a06c reverted SVN 7557 because these classes are called using reflection. The class declaration is in the log configuration. Without these classes you get errors during runtime and a non-formatted log output, i.e.:
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
low012 9d366ee9d7 *) removed unused code (I assume that most of the code was really dead, but if you need any of the classes, tell me and I will put it back in.)
14 years ago
orbiter 7138f4036b less synchronization, better thread dump tool
14 years ago
orbiter 8d14916c74 more patches for a better out-of-memory management
14 years ago
orbiter c2c5b12882 - even less memory for circle tool
14 years ago
orbiter 6badc5e558 reduce size of static memory usage: use short instead of int in circle coordinates cache
14 years ago
orbiter ce0c8247fc removed (most probably!?!) superfluos System.err output
14 years ago
orbiter 799c534935 one more patch again OOM during secondary remote search
14 years ago
orbiter 77b1e921a9 this asserts prevents a network operation in case of sabotage and must be removed therefore
14 years ago
orbiter f8d0454c53 small bug fixes and experiments with search speed enhancement
14 years ago
orbiter bed79402be introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured.
14 years ago
orbiter 6dfaf6fef7 fix for bug in deletion of old seeds
14 years ago
orbiter 993b9bc1a8 memory/performance hacks, less synchronization, better concurrency
14 years ago
sixcooler 65bcc60808 stupid me: revert placement of closing connection which caused unclosed connections
14 years ago
sixcooler e3d75d6cd5 Not storing external header in an Header-Array and reduce a loop for its conversion.
14 years ago
orbiter 42d90664f3 - fixed a memory leak in the httpc.post method (no finish)
14 years ago
orbiter 38dce547c0 better concurrency (less locking on date formatting) more logging and minor bug fixes
14 years ago
f1ori 59dea3a284 * implement url proxy, a proxy via the url http://peer:port/proxy.html?url=http://domain.tld/path
14 years ago
mikeworks 8b7b783c49 Tray.java: Broke the build on with wrong non UTF-8 encoded file and french umlauts (unmappable character for encoding UTF8)
14 years ago
mikeworks db65ada467 Tray.java: Added localization for french tray icon command - although this can probably also done better than with if statements. (preferably also from the locales file)
14 years ago
orbiter 89d337841c more logging for OOMs
14 years ago
orbiter b1781d7aae some more performance hacks
14 years ago
orbiter b2f147d28e performance hack: excluded map encoding in many cases from synchronization block, especially when doing an iteration
14 years ago
orbiter 5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
14 years ago
orbiter dec24244cf added convenience class to generate UTF StringBody objects with a default UTF8 charset.
14 years ago
orbiter 1110d16af9 performance hack: replaced generic row.getColBytes() call with row.getPrimaryKeyBytes() where the column is 0
14 years ago
orbiter 19b2a50578 - enhanced date formatter cache
14 years ago
orbiter 48a61c39a3 speed hacks in BLOB ArrayStack:
14 years ago
orbiter a92d80a545 performance enhancements using an alternative to a insensitive collator (a complex string compare):
14 years ago
orbiter f2e8ffd768 enhancement in synchronisation
14 years ago
sixcooler bcea497644 next try to fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39
14 years ago
orbiter ad7fcb9d61 Enhanced Base64Order transformation: less overhead (transformation between StringBuilder and byte[])
14 years ago
orbiter f95e50ec3d more explanation
14 years ago
orbiter bb36bf841a emergency commit (sorry sixcooler for not waiting) because without that automatic updating peers would not be able to do the next update.
14 years ago
sixcooler 8ad4e10491 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=3193&start=0&sid=b98aa9a7466397602b436eb45f4a9d39
14 years ago
orbiter 0ce17d823a - fixed bug in ordering
14 years ago
orbiter dec4f36700 - fix for missing favicons in search widgets
14 years ago
orbiter 804ae2275b - do not delete idx and gap files if the heap is not modified
14 years ago
orbiter e3ef4e3021 - increased default peer ping time from 2 minutes to 1 minute
14 years ago
orbiter 5e45ded8e2 - removed locks from WordReference
14 years ago
orbiter cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
14 years ago
orbiter d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
14 years ago
orbiter af87af0d4c - removed synchronization in serverSwitch which should improve speed
14 years ago
orbiter 4bd65532da initialization of libraries concurrently (faster start-up)
14 years ago
orbiter 57e6728cb7 - removed usage of /etc/alternatives/www-browser because of problems with lynx, see:
14 years ago
orbiter 91eeaf2cff fix in ftp client
14 years ago
orbiter e717bf74ba more logging, more care about OOMs
14 years ago
orbiter d84b4a072e healing for some OOM problems
14 years ago
orbiter 4aa406fb0f added log output to find bug in url parser for short hosts
14 years ago
orbiter 82f262f685 - enhanced circle drawing speed
14 years ago
orbiter 29dc416ac6 more animations in graphics. See network and access picture.
14 years ago
orbiter 93b9c4fbc9 added missing file for latest commit
14 years ago
orbiter a80ee9a03d THE GRID is coming to YaCy .. see new animated graphics on http://localhost:8090/AccessGrid_p.html
14 years ago
low012 ce012e11aa *) deleted LogStatistics since the page did not work anymore and it seemed to be obsolete, tell me if you miss it and I will add it again
14 years ago
low012 c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
14 years ago
orbiter d58071947a maybe terminateOldSessions is too slow, removed sleep
14 years ago
orbiter 3e380c51b6 update to browser start with linux
14 years ago
orbiter 6083f2f171 fix for (false) oom
14 years ago
orbiter b35fda43ea more changes to headless mode; now non-headless mode is used when:
14 years ago
orbiter 6c52e31993 new methods to open a browser
14 years ago
orbiter 5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
14 years ago
orbiter 4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
14 years ago