theli
5b114249ce
*) Bugfix for ViewLog problem with multiline logging messages
...
See: http://www.yacy-forum.de/viewtopic.php?t=2972
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2774 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
de5e233766
*) Bugfix for GuiHandler sorting problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2773 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fd94aa4bef
*) Bugfix for IndexOutOfBound in GuiHandler
...
*) Bugfix for reversed order displaying of messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2772 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
29a1318ef9
bugfixes for wrong database access that do not consider deleted entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2767 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cbb1e710b9
*) removing old class
...
- was replaced by plasma/urlPattern/defaultURLPattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c6d46f7ebd
null pointer bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
decb09df6d
*) Trying to be more tolerant against wrong charset names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e9afe39cbb
*) Trying to be more tolerant against wrong charset names
...
See: http://www.yacy-forum.de/viewtopic.php?p=26662
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7526c831a8
*) Suppressing stracktrace
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
50f2578c55
- some bugfixing and code cleanup
...
- now assortments can completely left out if they do not exist
before startup and collection index is selected.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bdf4c7c51e
added missing files for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a5dd0d41af
- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
...
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
deleted
- The complete database can be limited to a specific size (as wanted many times)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
130cc76927
loop detection and termination in deletedHandles method
...
see also: http://www.yacy-forum.de/viewtopic.php?p=26655#26655
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2754 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
octoate
1c4076da8a
First version of the MS Powerpoint parser based on Apache POI
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5b75d64d7d
*) bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
71ed104bc7
*) adding additional rpm mimetype (used by packman)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
76d959122b
new constants, finals, Stringbuffer, cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2748 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6396f5971e
bugfixes and migration attempt toward new kelondroFlex db
...
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
48f81acc0e
reverse SVN 2744, it is not needed
...
(this resulted from a small misunderstanding of the newest cache layout)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
1da9aece12
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
918b59dc5e
- bugfix for snippet profile (no delete button)
...
- bugfix for search process (avoided null pointer exception in case other peer does not respond)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2742 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2bb529cedb
added peer tags for peers in robinson mode
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2741 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
afbb547f3d
extended options for abstracts generation in remote search interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2739 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
22649408ad
*) Better errorhandling for charset encoding problem during content parsing
...
See: http://www.yacy-forum.de/viewtopic.php?t=2952
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9c7e3f061
*) Bugfix for NoSuchElementException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f25f61d9d3
documentation of compile problem. See
...
http://www.yacy-forum.de/viewtopic.php?p=26407#26407
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2734 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c8f3a7d363
added snippet-url re-indexing
...
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2cfd4633ac
*) even better handling of searchwords in snippets, words can consist of letters and numbers now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b062847797
fix for
...
http://www.yacy-forum.de/viewtopic.php?p=26439#26439
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2731 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e17fea7015
files in htcache are now stored in different hash/tree subdirectories
...
according to storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
661f005214
fix for seed upload build script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2729 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2d3b7251a4
*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ddf8f220f6
fix for build fail
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2727 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
25ae3d3161
generalized definition of hexhash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
86047f439d
removed very bad bug that prevented production of any remote search result
...
:-(((
Please update!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2724 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f0d747c723
removed deprecated method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5ff77612ac
bugfix for old WORDS storage method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0f10bdde22
more generic cache methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
72482b1426
fixed scraper
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2720 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
6557112d8f
small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
440c6ee657
Implement alternative htcache layout
...
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
226f2c5b2c
first version, of the Serverlet Debugger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
adf1f74ab2
bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
fd61209797
lines inside tags without punctuation are extended by a single dot.
...
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
1d0c0edda3
first version of posts/get from the del.icio.us api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1969522dc1
removed lowercase of snippets (and other things):
...
- added new sentence parser to condenser
- sentence parsing can now handle charsets
to do: charsets must be handed over to new sentence parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
43614f1b36
bugfix in collection index. the index for collections was not created correctly
...
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1dfab1abe3
more control for seed receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1c0e65f55f
*) Bugfix for problems with charset detection
...
See: http://www.yacy-forum.de/viewtopic.php?p=26196
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
db294687ea
enhanced logging
...
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9a0f51303
*) suppressing InterruptedException errormessage
...
See: http://www.yacy-forum.de/viewtopic.php?t=2915
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ce7ee74316
*) better errorhandling in filehandler (try catch block now starts before argument parsing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1d4fb680ce
*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
...
TODO: make this limit configurable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1586d57187
*) odtParser: better handling of large files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f17ce28b6d
*) plasmaHTCache:
...
- method loadResourceContent defined as deprecated.
Please do not use this function to avoid OutOfMemory Exceptions
when loading large files
- new function getResourceContentStream to get an inputstream of a cache file
- new function getResourceContentLength to get the size of a cached file
*) httpc.java:
- Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
- new option to hold loaded resource content in memory
- adding option to use the worker class without the worker pool
(needed by the snippet fetcher)
*) plasmaSnippetCache
- snippet loader does not use a crawl-worker from pool but uses
a newly created instance to avoid blocking by normal crawling
activity.
- now operates on streams instead of byte arrays to avoid OutOfMemory
Exceptions when operating on large files
- snippet loader now forces the crawl-worker to keep the loaded
resource in memory to avoid IO
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
- keep resource in memory whenever possible (to avoid IO)
- when parsing from stream the content length must be passed to the parser function now.
this length value is needed by the parsers to decide if the parsed resource content is to large
to hold it in memory and must be stored to file
- AbstractParser.java: new function to pass the contentLength of a resource to the parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
630a955674
read snippets from cache in case they are not provided in RAM
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bcf2b800b4
applied UTF-8 encoding parameter to yacy-internal protocol communication
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c40fca08a2
fixed bad handling of string separation
...
you can now use a new encoding attribute to create strings from byte arrays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5a40ea7866
refactoring of wget string list generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
dbc2e039bb
added time-out option parameter to call hierarchy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c239e4be
- fixed problem in collection index with deletion of single url references
...
- added automatic deletion of not-found snippets after search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
00746ca232
identified and fixed search performance problem caused by
...
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b033a80750
better control of failure in node seek of kelondroTree
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
310f1c41cd
added option to see ranking scores in surftipps
...
and some cleanups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a2e3095044
*) Bugfix. Add missing plasmaParserDocument.close() calls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cd5f349666
*) Better handling of large files during parsing
...
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
Attention: the caller of this function has to ensure that enough memory is available to do this
to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java:
- better handling of documents with exotic charsets
- better handling of large documents
- better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
to this object as byte array or temp file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8b2ceddb91
*) Displaying servere and warning logging messages in different colors on ViewLog_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2678 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
f8ac694e51
*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c665f6cddb
*) handling of quotes in charset string
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2674 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b73efd5565
*) missing changes needed because of last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
140ddba93f
*) adding soap functions to pause and resume the crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2668 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2463e5624a
'quick' release 0.47
...
- documentation update
- necessary bugfixes (missing css for new peers)
- reduced effect of search result redundancy filter
- removed some debug output, but not all
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
49fbb688df
*) SOAP: old urlInfo renamed to urlInfoByHash, new urlInfo Function added.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2662 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8f143d516b
*) make snippet fetcher accessible via soap api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2661 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
97615af406
*) Restructuring of YaCy SOAP services
...
- general functions moved to abstract service class
- service class splitted into SearchService, CrawlService, StatusService
*) Bugfix for SOAP search services
- Attention: some xml tages where renamed
See: http://www.yacy-forum.de/viewtopic.php?p=25877
*) New SOAP service function urlInfo to view the parsed content of an URL
See: http://www.yacy-forum.de/viewtopic.php?p=25869
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2660 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
241b881560
*) Redesign of YaCy SOAP handler
...
- should be more fail-safe now
- better handling of compressed request bodies
- better handling of persistent connections
- better handling of AxisFaults
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2659 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
009a33170b
*) Content-Location header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2658 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1aa07a52cd
*) Bugfix for UnsupportedEncodingException if the media type contains multiple parameters
...
See: http://www.yacy-forum.de/viewtopic.php?p=25832#25826
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2654 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
625c2ce6b1
*) bugfix for snippet fetching problem if content but not http header is available in cache
...
See: http://www.yacy-forum.de/viewtopic.php?p=25748
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
813a8a8179
*) migration of mimeTypeParser to jmimemagic 0.1
...
- better mimetype detection for rss feeds
- better mimetype detection for odt documents (less memory consuming)
- two new detector classes implementing MagicDetector interface of jmimemagic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
3f5a4153a0
Make Peers more receptible to transferred indexes
...
- Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit
so that the inCache gets flushed when the limit is passed
- Modify flushCacheSome to flush enough words to get below MaxWordCount immediately
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
57415b6889
*) Bugfix for surftipps UTF-8 problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=2864
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2647 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
b0a4fcce8c
fix from theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2642 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b6c7b91582
*) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
...
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
- crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e03427871e
enhanced surftipps:
...
- added switchh to show or hide surftipps
- more news contribute to surftipps
- added voting system for surftipps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2638 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1dc12d6659
*) Bugfix for shutdown problem caused by cacheScan thread
...
See: http://www.yacy-forum.de/viewtopic.php?p=25729
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
42173462f5
rename cutUrlText to shortenURLString;
...
other little things;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
af1d89e381
check url == null added;
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2634 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cc667b0aa5
*) htmlFilterContentScraper.java: adding support for link tag
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2633 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
26dfbb7499
*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cf6acff2c2
*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
...
default InputStream Buffer size.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
f18304ddd3
unused/not needed imports removes;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2628 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ec031eb993
first version of surftipps
...
see http://localhost:8080/index.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2627 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
b174fbd0ca
"import ...*" removed;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2626 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
807756150e
patch for strange bug reported by email
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2625 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5c6251bced
*) some improvements for extended html document charset support
...
- new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract
the charset meta data. This is only enabled for the crawler at the moment. Integration into
proxy needs more testing.
- adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed
about detected tags (used by the htmlFilterInputStream.java)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
33f0f703c0
*) reinserting type cast again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2623 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8c11a543dc
fixed line ending coding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2622 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b690597275
*) adding casts to avoid compatibility problems between java 1.4 and java 1.5 writer class usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2621 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5afb0cbce8
*) setting default charset (for unkown documents) to iso-8859-1
...
*)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f453c14b5d
removed unreacheable catch blocks and unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ad7f600f25
*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
97d2a08ef1
*) restructuring needed to support parsing of documents using various charsets
...
- serverFileUtils.java:
-- adding methods to copy from stream to writer and readers to writers
-- moving httpc writeX methods into serverFileUtils class
- serverCharBuffer.java: removing inheritance from Writer class
- replacing htmlFilterOutputStream by htmlFilterWriter class which handles
content as char stream
- htmlFilterContentTransformer.java: deactivating getText mode
(still needs to be migrated to use char streams instead of byte streams)
- changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
- changes in Scraper and Transformer classes to operate on chars instead of bytes
- httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fc594e8eda
*) adding httpContentLengthInputStream.java class to allow reading of http response bodies
...
until EOF even if a persistent connection is used
*) httpdByteCountInputStream.java: adding skip method
*) httpHeader.java: adding getCharacterEncoding function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2616 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
cd636eb00e
*) Fix for the fix...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2615 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
f9a5b55a9e
*) Fixed bug described in http://www.yacy-forum.de/viewtopic.php?p=25448#25448
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2614 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3aac5b26da
- added automatic tag generation when a web page from the search results is added
...
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
8a30c5343d
*) Fixed bug where exclamation marks could get lost between [=...=] and <pre>...</pre>
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2612 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
d8f4b17e31
*) Hopefully fixed bug described in http://www.yacy-forum.de/viewtopic.php?t=2825 .
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2611 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
0e84a969d6
*) Bugfix for serverCharBuffer read from file operation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2607 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
90ef19d778
*) first version of a serverCharBuffer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2606 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d374ef2bbe
bugfix for tryRemoveURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2605 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f644a1c3a7
better evaluation of index abstracts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1b48473bc5
bugfix to utf8 recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2603 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
90f7241b59
serverByteBuffer.trim() can now recognize utf-8 characters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2602 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
2fd610b556
http://www.yacy-forum.de/viewtopic.php?p=25611#25611
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e34d9b3fec
*) charset aware headlines (after the serverByteBuffer.trim problem is solved)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2599 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8115ac47b5
*) charset aware metadata parsing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2598 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3ac30bdf22
*) some todo markers added for additional charset support
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2597 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
06fa891152
*) htmlFilterContentScraper.java: using proper charset for document title
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
74c3e7cf29
*) storing document charset into plasmaParserDocument object (is needed later by the condenser)
...
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c5d3020941
*) better errorhandling for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
d0a5a53789
*) changes needed for multi-language support
...
- parsers may need to know the charset of the byte stream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d82875c72b
removed removal of 'funny symbols' that may have caused utf-8 problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2589 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
26ab1fa885
fixed null pointer exception
...
See http://www.yacy-forum.de/viewtopic.php?p=25598#25598
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b0e8ff6eda
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
41e27b85b7
fix for crawler condition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0ee7e45413
bugfix for merge method (caused by bad refactoring)
...
see http://www.yacy-forum.de/viewtopic.php?p=25529#25529
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2581 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5c2f30eaca
adjustments to dhtInCache write
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2579 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9ecf7f0da2
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e2f8339827
*) some bugfixes for UTF-8 related problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2577 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c89d8142bb
replaced old 'kCache' by a full-controlled cache
...
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6e2907135a
bugfixes for remote search server part
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
cf9884e22b
first attempt to implement a secondary search
...
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
2a06ce5538
*) next bugfix for UTF-8
...
- Sending UFT-8 messages to other peers did not work
- httpd.java: minor corrections for UTF-8
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2570 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
bdc51591ae
*) UTF-8 Bug solved (hopefully)
...
See: http://www.yacy-forum.de/viewtopic.php?p=25522
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2569 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ef751b9d33
*) removing all string operations from the template engine
...
- engine should fully operate on bytes now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2567 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
7ef80c1026
more debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2566 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b251076e64
avoid ConcurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
75b198bc02
- updated references to indexContainer
...
- more bugfixes and debugging for indexAbstract processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0bed3b9ac3
removed superfluous interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2554 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b7e7808ea6
wordmigration now works also for new index database
...
if the new database is switched on, no 'too big' messages appear,
all the WORDS files can be completely migrated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a0ddf2ec11
*) AbstractCrawlWorker.java: delete already downloaded data on crawling error
...
*) plasmaSwitchboard.java: log unexpected errors while parsing/indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4f9e42d5ed
more changes towards better join-search
...
- fixed problems with index-abstract generation
- added analysis output for index abstract receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a7281a9b4d
fix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
82a6054275
- fixed bug with new indexAbstract generation
...
- added partly evaluation of indexAbstracts during remote searches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fded1f4a5d
*) better handling of maximum file size limit in crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
416b4e5c6b
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2542 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
309accb983
memory control for ymage generation:
...
the ymageMatrix initializer throws an RuntimeException if there is not
enough memory available to generate a new ymage of wanted size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2541 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
74d1dea30b
changes towards better join-search
...
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ae4e8ce03e
- cut for 'probably last html-interface version': version number update
...
- small enhancement to ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
64bed59ee8
enhancements to ranking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
63893003be
*) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
...
*) adding first version of maximum filesize check for the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
06b1365066
*) fixed existing protection against divbyzero and removed the new one
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2530 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
94d7ced900
fix for last ranking commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
cc97a3e9c6
fixed possibly bug with indexOutOfBoundsException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2528 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
03835c2ee8
enhanced search result computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
809960ddc6
avoid division by zero
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2526 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ac3419b65f
better debugging for indexOutOfBoundException bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
75b03a4580
fix for new ArrayIndexOutOfBoundException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2524 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a8bc768206
enhancements to ranking evaluation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
auron_x
a82e926c5d
*) fix for wrong totalPPM-calculation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2522 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
33898ae7e9
*) ResourceInfoFactory.java: Bugfix for classNotFoundException
...
See: http://www.yacy-forum.de/viewtopic.php?t=2797
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
406e170e25
*) more verbose error message
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b298474e22
*) Bugfix needed because of changed plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c2e6cc8c6b
small part of Bosts patch
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2517 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
96c6e4e322
- enhancements to detailed search page
...
- enhancements to search ranking computation process
- removed bugs in postranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
9340dbb501
fixed all possible problems with nullpointer exception for LURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a5ed86105b
*) bugfix for handling of ResourceInfo object in proxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
ff4362b02d
some more fixes for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
7aeadbe7cc
another NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
141f9e5bb4
fix for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1e7fd48afd
added size method to ftpc
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2508 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
087f7511f8
prevent NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
a2525072f2
bugfix for kelondroRow - property generation
...
this bug affected ranking parameters :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hydrox
59a5511dbb
*) added missing static Strings as requested by theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2505 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
6578564c9a
*) Ignore more hop by hop http headers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b44514242a
*) crawler/ftp/CrawlWorker.java: better errorhandling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7d7f30139c
*) crawler/ftp/CrawlWorker.java: delete old cache file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4ae0f122f8
*) ResourceInfo.java: License header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
043edfa4d8
*) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
...
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
4866868c0e
added write cache for LURLs
...
This was necessary to speed up the index receive process during global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8a0e35618b
enhancements to search result preparation
...
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5c1bb53d2a
Missing description for last commit
...
*) next step of restructuring for new crawlers
> HTCaching should now work protocol independent
-- introduction of new ResourceInfo objects containing protocolspecific metadata
of a resource.
-- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX,
shallStoreCacheForXXX in a protocol dependent manner
> Indexing should also work protocol independent now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
dae763d8e3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
4825bfaaf3
*) Bugfix for PrintWriter Problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=2792
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c5e2af01
html-dirlist can now also be generated from existing connections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2493 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7930839594
*) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
...
*) CrawlWorker.java: using new dirhtml function of ftpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
17ba468165
added html dirlisting generation in ftpc.java:
...
ftpc.dirhtml() generates a StringBuffer with a complete web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2491 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
7a35b8e237
*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ffbf416e76
*) direct access to requestheader of htCache.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3870d615e3
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
393a7d10be
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ab5a9bee66
*) adding some copyright headers
...
*) next step of restructuring for new crawlers
- adding first testversion of ftp crawler class
-- does not create a htCache entry yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5847492537
*) next step of restructuring for new crawlers
...
- IndexCreate_p.java: correcting problems with ftp urls
- URL.java does not cutout the userinfo anymore
(needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
- plasmaCrawlLoader.java:
-- hack to re enable https urls
-- adding function getSupportedProtocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
6cce47e217
test of ftp-urls in URL class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2481 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fce9e7741b
*) next step of restructuring for new crawlers
...
- renaming of http specific crawler settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e3f0136606
*) next step of restructuring for new crawlers
...
- adding function isSupportedProcotol to plasmaCrawlLoader.java
- disabling robots.txt check for protocols other than http(s)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
9ded4e8d5a
*) Bugfix for name resolution in proxy mode
...
See: http://www.yacy-forum.de/viewtopic.php?p=25241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago