Commit Graph

4304 Commits (7417425e6a6b6d51b85ebeb208db850074ad7d62)

Author SHA1 Message Date
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter f4946eaf27 - better thread dump
15 years ago
orbiter 9743b70d1c disabled keep-alive of server, not really needed for speed but a cause for much trouble and memory occupancy
15 years ago
orbiter 491ba6a1ba - some refactoring in workflow
15 years ago
orbiter 969123385b added json and rss output for image search
15 years ago
orbiter d183f8d980 refactoring (moved code from ContentTransformer to TemplateEngine)
15 years ago
orbiter 23aef43786 - better synchronization in SortStack
15 years ago
orbiter 7b1f5b0430 - better media search ranking
15 years ago
orbiter 4df88a4e7a - fixes for missing or bad hashCode computation
15 years ago
orbiter dbdf2570ba added comparator and more fixes for SortStack/SortStore
15 years ago
orbiter d2938c44a1 - added bmp parser to the document parsers
15 years ago
orbiter 1dff620181 Better implementation of SortStack and SortStore and adoptions in all using classes to implement the necessary Comparable interface and hash code computation.
15 years ago
orbiter fe41a84330 some enhancements in web caching: avoid double loading of response metadata and/or content
15 years ago
orbiter 06d0dcde20 more enhancements to image search
15 years ago
orbiter 4c6312d103 enhanced image search
15 years ago
orbiter 2d8f3ee301 some performance hacks
15 years ago
orbiter 94b2a664f3 - use a static DiskFileItemFactory (one instantiation is enough)
15 years ago
orbiter fd0658ce7c avoid forced execution of InetAddress.getLocalHost() at startup, because that hangs at some strangely declared linux configurations. The Domains.localHostAddresses object is first instantiated with a more simple logic and enriched with more host addresses using a concurrent thread that will not block a startup process.
15 years ago
orbiter 013f337d3f - avoid unnecessary host name lookups for localhost
15 years ago
orbiter 20c5d78a5c fix for a ConcurrentModificationException
15 years ago
orbiter 5afd9f7a91 fix for crlf writing
15 years ago
orbiter 7144d2df6e added crawlReceipt servlet as individual class to examine OOM problem as documented in
15 years ago
orbiter 2d3c98b742 less computation within synchronized blocks
15 years ago
orbiter 1a146b0d73 added a patch to ignore bad mime-ignore patterns
15 years ago
orbiter 29fe436e36 - fixed post-ranking including prefer mask
15 years ago
orbiter 5399d1e2bc refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
15 years ago
orbiter a97fdb4566 catch for NPE in image parser
15 years ago
orbiter 534182559c removed concurrency hacks from SplitTable because it showed deadlock-like situation.
15 years ago
orbiter 1fa0ac26e9 better protection against NPEs during search/ranking
15 years ago
orbiter 4c99d4683d possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
15 years ago
orbiter cd6745b292 accept rss feeds without channel descriptions
15 years ago
orbiter 08f1cbb125 another update to the pdf parser
15 years ago
orbiter 54c54fb144 get a handle for grep: 'StackTrace'
15 years ago
orbiter 605e896d6c more details for exception catching when parsing pdfs
15 years ago
orbiter 18b21eaffe small fixes to search default values and server logging
15 years ago
lotus 6edc168cfe option to disable dht by memory limit:
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter e3025ee691 - new icon for OAI-PMH loading action
15 years ago
orbiter f0b8db93f0 - more abstraction of serverCore thread access
15 years ago
orbiter 19f31bb043 - moved OAI-PMH source list file from SETTINGS to DICTIONARIES/harvesting
15 years ago
orbiter 2889b9426e missing code for last commit
15 years ago
orbiter b6a8887ff5 better handling of running sessions without explicit hashtable
15 years ago
orbiter 1dc7ea986a added a dynamic keep-alive time-out for http server sessions:
15 years ago
low012 e77c906673 *) minor changes mainly in comments
15 years ago
low012 f1740edbf8 *) added skript to change memory settings, password and port (experimental, don't blame me if it messes up your configuration)
15 years ago
orbiter 11f7da06ed - fixes to csv parser
15 years ago
orbiter 9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from
15 years ago
orbiter 176e334aa4 fixes
15 years ago
orbiter 2fa6bf440b workflow update to OAI-PMH importer
15 years ago
orbiter b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
15 years ago
orbiter 350d13e153 very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
15 years ago
lotus 58616d99e4 patch for yacy disk usage detection on lvm host
15 years ago
lotus 79251e6f60 configurable disk space hardlimit for dht
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 4240785f20 added anti-alias function for line drawing
15 years ago
orbiter 30f108f97d added stub of oai-pmh importer (not working yet)
15 years ago
orbiter 77c99e500f added more control over memory allocation
15 years ago
orbiter 52470d0de4 - fix for xls parser
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter a8ce192f63 - shifted main classes to new package net.yacy
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
hermens 0fd9540866 Configuration of HTTPDProxyHandler logging
15 years ago
orbiter cee7a05ff2 - de-serialized the pdf parser
15 years ago
orbiter 9db928ce53 replaced fontbox 0.7.3 with fontbox 0.8.0
15 years ago
orbiter c2272785c7 - fix for xlsx and pptx parsing
15 years ago
orbiter c864901087 - moved httpd.mime to defaults path
15 years ago
low012 8829ec5f18 *) made sure that   is replaced with a space and not just deleted in CharacterCoding.java
15 years ago
orbiter 6c347a37eb more options for DocumentIndex
15 years ago
orbiter 6192205533 more final modifier
15 years ago
orbiter 0f6b011e1a fix for new index location and better way to use own classes by reflection
15 years ago
orbiter 7a3bbd950f :-(
15 years ago
orbiter b953f04f90 one more reflection fix
15 years ago
orbiter 77d6604856 fix for npe, see http://forum.yacy-websuche.de/viewtopic.php?p=17727#p17727
15 years ago
orbiter 2a7fe35f92 performance tuning using more final modifiers in the kelondro core
15 years ago
orbiter cb4de9ceee fixed a bug in table iterator (did not recognize elements in write buffer)
15 years ago
orbiter e7f18ba24b refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter bd876eb4b7 moved io classes
15 years ago
orbiter c0e0e1f422 moved blob classes
15 years ago
orbiter 1e4f8b56ed accumulated classes from different packages into the new rwi package
15 years ago
orbiter 194da25a2f moved kelondro index
15 years ago
orbiter 4446acc8cd moved kelondro order
15 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
15 years ago
orbiter ea473e32b8 refactoring
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter 09de5da74a once again a performance hack
15 years ago
orbiter 2f6d88403e
15 years ago
orbiter d2615ea5a8 increased memory for scraper buffer to enhance parsing speed
15 years ago
orbiter 4bbbb74ec4 removed not necessary synchronization
15 years ago
hermens 67e5464cc2 Fix for SVN6380: x[] Arrays are unsuitable Keys for Maps without using a proper Comparator.
15 years ago
hermens aeab8c7917 Prevent failed DHT attemps from overwriting newer peer info
15 years ago
hermens 9324b5b6c5 Enhancements to DHT
15 years ago
hermens e49e2d75fe Limit the time Transmission.Chunks stay in the transmissionCloud by using a Map that iterates entires in insertion order.
15 years ago
orbiter 92db7c5d07 increased timeout for index retrieval
15 years ago
lotus 386b9f35f6 activated resource observer for windows 7
15 years ago
orbiter 6e0dc39a7d - some fixes to prevent blocking situations
15 years ago
orbiter 51f2bbf04b possible fix for problem in http://forum.yacy-websuche.de/viewtopic.php?p=17655#p17655
15 years ago
orbiter f8371707e5 - possibly better termination for SplitTable
15 years ago
orbiter 87780f2562 produce did-you-mean also for queries with more than one word
15 years ago
orbiter 04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
15 years ago
orbiter ea427df944 fixed a worst case situation of the condenser which may cause a temporary full CPU load because of a bad data structure usage
15 years ago
orbiter 3e38035389 fix for interrupted thread during has() property check
15 years ago
orbiter 5bd1c1d205 just added some comments that had been produced to learn about OAI-PMH
15 years ago
orbiter 6aa474f529 - better logging for web cache access and fail reasons
15 years ago
orbiter 3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
15 years ago
orbiter 58a00205d5 re-activated the emergency close when too many server connections exist
15 years ago
orbiter c57d2070e6 more logging
15 years ago
orbiter a995b95367 tried a fix for the httpd access bug (too many unclosed sessions)
15 years ago
orbiter e1fba41cad better logging
15 years ago
orbiter 2275f885a8 possible fix for concurrency problem
15 years ago
low012 a6a3090c3d *) blacklist cleaner supports usage of regular expressions now
15 years ago
orbiter 5a93807781 improved web cache speed:
15 years ago
orbiter 2e8b2867ff double performance of store method because it avoids one 'has'
15 years ago
orbiter afda5b1adc new join method for indexes (not yet used)
15 years ago
orbiter 65b66c2c18 better handling of array files of length 0
15 years ago
orbiter 1957b5797a fix for seed generation
15 years ago
orbiter 432154f725 new strategy for concurrent database index key retrieval
15 years ago
orbiter a11cd9f80f - removed reverse name lookup for http access logging (grr..)
15 years ago
orbiter 2e6bdce086 - added more logging to balancer
15 years ago
orbiter 1171a72006 fix for deadlock as seen in http://forum.yacy-websuche.de/viewtopic.php?p=17521#p17521
15 years ago
orbiter 031e6eefbd some updates to dublin core, metadata browsing, file indexing and parser stability
15 years ago
hermens 62a7341c4d Fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2204
15 years ago
low012 f65bfaa9af *) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
15 years ago
orbiter e4797ebcde fix for http://forum.yacy-websuche.de/viewtopic.php?p=17509#p17509
15 years ago
orbiter efa7fb34f0 better oom-awareness of miss-cache in cache
15 years ago
orbiter 3e9dcfc204 fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
15 years ago
orbiter c3a4aee255 some redesign with a possible fix for the ReferenceContainerCache.
15 years ago
orbiter aca8a78eb8 fix for shutdown of DocumentIndex objects
15 years ago
orbiter 23ab6fbca4 - navigation appear at correct position when opengeodb-results are also presented after a search
15 years ago
orbiter 4db34eea73 fix for OOM problem in kelondro Cache
15 years ago
orbiter 8ea1d7ab59 fix for wrong assert condition in search abstract generation
15 years ago
orbiter fbd77bd77c git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6328 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter 54c7cbf1d9 - fast result for local search in case that less than 10 hits exists
15 years ago
orbiter 28d4b921b6 different approach for file search
15 years ago
orbiter f99f86c5c5 added concurrency to file indexing class
15 years ago
orbiter 902d16cf6c fixes to parser
15 years ago
orbiter 4a1c852435 fix in usage of RAM copy for Table objects and some cosmetics in asserts.
15 years ago
lotus dce450e2e0 possible fix for "hung" doc-documents
15 years ago
orbiter e627f75415 one more fix to badwords and stopwords
15 years ago
orbiter 721b88efbd - fixed a problem loading blacklists with new yacycore.jar
15 years ago
orbiter 80d5005044 fixed seed upload methods - replaced reflection with direct instantiation
15 years ago
orbiter 68465c37af added a convenience class to add files into a YaCy index
15 years ago
orbiter 2e41e10ffd - updates to yacyVersion parser (remove old targets)
15 years ago
orbiter 27d00285aa - added a new file reader cache that may serve as full-file-copy of blob database files. This is not yet used
15 years ago
orbiter fd6b9cb7dc refactoring of IO access classes
15 years ago
orbiter d64569aa39 reuturn only recommendations of words that have a greater count than the original word
15 years ago
orbiter 604c37927f used comparator for did-you-mean that uses index sizes for comparisment, but:
15 years ago
orbiter a58d9cae7d - show location name in geolocalization search result
15 years ago
orbiter 573d03c7d7 added configuration to enable ram table copy
15 years ago
orbiter 3be54e1891 fix to rule when to use a ram table copy
15 years ago
orbiter 700218846c disabled or removed sleep calls
15 years ago
orbiter 342c5d0fd4 fixed city name detection: finds now also substrings of city names
15 years ago
orbiter 18aa0609ca fix for caching of word hash computation
15 years ago
orbiter a10a6cce45 patch for http://forum.yacy-websuche.de/viewtopic.php?p=17289#p17289
15 years ago
low012 53bbdfd19a *) setting SVN keywords
15 years ago
low012 25f6145934 *) preventing null pointer exception in case empty search word or only one character is enterd or all search words are removed by filters
15 years ago
low012 248f3fd9b5 *) cleaned up code for better readability
15 years ago
orbiter eaddf2d464 - corrected layout of map preview
15 years ago
hermens 4b83875abd Small fixes for the heapCacheIterator in ReferenceContainerCache:
15 years ago
orbiter fd668f531b fixed map layout
15 years ago
orbiter 2740d9dd79 added integration of osm maps for search
15 years ago
orbiter af3a696fc4 added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast.
15 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
15 years ago
orbiter 44579fa06d - fixed a problem loading images through yacy's document loader,
15 years ago
orbiter 67eddaec4b changed way to integrate dictionary files:
15 years ago
orbiter d656a94f55 fix for bad paths in dictionary processing
15 years ago
orbiter 3b9aaf9e9f - inserted new library tests inside DidYouMean
15 years ago
orbiter 8c35ffe34c fixes to the dymlib
15 years ago
orbiter bfa273bcc1 added a library provider which holds libraries in static objects,
15 years ago
orbiter 1762a7bcd6 - moved DidYouMean to the data package
15 years ago
orbiter bf8ed00e9e removed debugging code
15 years ago
orbiter ead48c4b25 fix for preparation of search result pages with offset > 10:
15 years ago
orbiter 39a311d608 better care to do not loose the merge/dump thread
15 years ago
orbiter 10d3e856b5 better concurrency, less blocking & performance hacks
15 years ago
orbiter 1a9cfd8718 some performance hacks (CPU only, not IO)
15 years ago
orbiter 92407009b2 cleanup
15 years ago
orbiter 0ba1beaf56 separated rwi constraint evaluation from rwi ranking and added concurrency
15 years ago
orbiter ce7924d712 better concurrency for rwi entry parsing during search processing
15 years ago
orbiter b0637600d5 enhanced url constraint computation: better position of constraint check during retrieval process
15 years ago
orbiter 61748285c3 more refactoring of search
15 years ago
orbiter 323a8e733d removed unused classes
15 years ago
orbiter 72e5407115 refactoring of snippet cache
15 years ago
orbiter 0e471ba33b - fixed a bug in fast digest computation
15 years ago
low012 93b2622503 *) repaired and added IM online status indicators
15 years ago
orbiter e7736d9c8d more refactoring: made all variables in SearchEvent private
15 years ago
orbiter 4b92d0b9b7 patch for possible problems with normalization of '/' in urls. This applies in rare cases when '/' appear in post-properties
15 years ago
orbiter d8ca6e6bf1 more refactoring for search
15 years ago
orbiter fe4a4e3f6b added missing class
15 years ago
orbiter 72ac5bd80f refactoring of search process.
15 years ago
hermens c4d0e22a77 Further speed upof concurrent DHT-receive
15 years ago
hermens 2fbc0696bf Fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2334
15 years ago
f1ori d515bc11e2 added ooxmlparser
15 years ago
orbiter d9744b1b5d replaced old caching strategy control class with lightweight simplearc
15 years ago
orbiter 8e56c2ace6 fix for fixes from this afternoon
15 years ago
orbiter cf739edc2e fix for possible deadlock, see
15 years ago
orbiter 6354b5e447 removed possible deadlock, see
15 years ago
orbiter 5cc17ccf8a a better caching with less overhead and more appropriate
15 years ago
orbiter 92edd24e70 fixed problem with switching of networks
16 years ago
orbiter 0575f12838 fix for deadlock
16 years ago
orbiter fbfdaf063d - patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys
16 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 634a01a9a4 replaced wget-requests with caching requests
16 years ago
orbiter c6c97f23ad - added cache usage properties to crawl start
16 years ago
orbiter c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
f1ori ba2e6de538 fix empty version string again
16 years ago
orbiter 51534df0cb fix for possible synchronization problem
16 years ago
orbiter 4da9042e8a code simplification
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
f1ori 67da20647f * add new odf parser based on sax-xml-parser
16 years ago
f1ori 6d0e6d591b * ops, fix compiler error :(
16 years ago
f1ori 3e5beb1654 * fix for empty version in seedlist
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter 597393db3b changed default visibility of classes/objects in upnp lib
16 years ago
orbiter eea4c17ef2 removed rpm parser
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 8103ccec4c removed compiler warnings in imported classes
16 years ago
lotus 52e371b8f7 suppress warnings for upnplib code
16 years ago
lotus 477807e0e6 * updated jxpath to latest v1.3
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
orbiter a564df3984 update to mime types in parsers and httpd.mime
16 years ago
orbiter 43c8defd79 enhanced parser with more extension + mime attributes
16 years ago
orbiter aee35bff6f replaced StringBuffer with StringBuilder in tar lib
16 years ago
orbiter 49bbb9bd45 replaced tar library with integrated apache ant tar lib
16 years ago
orbiter f987fc6b4a added tar classes from apache ant tools
16 years ago
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
orbiter 50cf80056f removed jmimemagic library
16 years ago
orbiter 3f113f38a8 removed unused imports
16 years ago
lotus 9f083bb6b2 check filetype before loading (no more mp4 loading)
16 years ago
f1ori 076ae02c44 * added pl and py to extensions excepted by htmlParser
16 years ago
f1ori d5e51cfd09 * workaround for non-working build property replacements
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
f1ori 8931c8d6b4 improvments to debianpackage:
16 years ago
low012 fc1dc38b55 *) added spaces to make sure that no words are concatinated by accident
16 years ago
low012 f242e7d7bc *) using Apache POI library to parse Word documents now
16 years ago
orbiter caedd72400 - enhanced logging and exception details for parsers
16 years ago
orbiter 4b74ad0a46 fixed setting of parser configuration servlets
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
lotus e15d27bc63 avoiding double/wrong parser errors
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter 8ca1f5d400 - some work to integrate the html parser the same way as the other parsers are integrated (not finished)
16 years ago
low012 1ee109761f *) added changes which were lost
16 years ago
orbiter 499723891d removed all non-http daemons; they had not been used and may be a potential security risk.
16 years ago
orbiter 0e8647d62f refactoring of search classes
16 years ago