theli
26dfbb7499
*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
cf6acff2c2
*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
...
default InputStream Buffer size.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
f18304ddd3
unused/not needed imports removes;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2628 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ec031eb993
first version of surftipps
...
see http://localhost:8080/index.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2627 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
b174fbd0ca
"import ...*" removed;
...
properties added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2626 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
807756150e
patch for strange bug reported by email
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2625 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5c6251bced
*) some improvements for extended html document charset support
...
- new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract
the charset meta data. This is only enabled for the crawler at the moment. Integration into
proxy needs more testing.
- adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed
about detected tags (used by the htmlFilterInputStream.java)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
33f0f703c0
*) reinserting type cast again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2623 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
8c11a543dc
fixed line ending coding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2622 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b690597275
*) adding casts to avoid compatibility problems between java 1.4 and java 1.5 writer class usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2621 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
5afb0cbce8
*) setting default charset (for unkown documents) to iso-8859-1
...
*)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f453c14b5d
removed unreacheable catch blocks and unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ad7f600f25
*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
97d2a08ef1
*) restructuring needed to support parsing of documents using various charsets
...
- serverFileUtils.java:
-- adding methods to copy from stream to writer and readers to writers
-- moving httpc writeX methods into serverFileUtils class
- serverCharBuffer.java: removing inheritance from Writer class
- replacing htmlFilterOutputStream by htmlFilterWriter class which handles
content as char stream
- htmlFilterContentTransformer.java: deactivating getText mode
(still needs to be migrated to use char streams instead of byte streams)
- changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
- changes in Scraper and Transformer classes to operate on chars instead of bytes
- httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
fc594e8eda
*) adding httpContentLengthInputStream.java class to allow reading of http response bodies
...
until EOF even if a persistent connection is used
*) httpdByteCountInputStream.java: adding skip method
*) httpHeader.java: adding getCharacterEncoding function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2616 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
cd636eb00e
*) Fix for the fix...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2615 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
f9a5b55a9e
*) Fixed bug described in http://www.yacy-forum.de/viewtopic.php?p=25448#25448
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2614 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3aac5b26da
- added automatic tag generation when a web page from the search results is added
...
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
8a30c5343d
*) Fixed bug where exclamation marks could get lost between [=...=] and <pre>...</pre>
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2612 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
d8f4b17e31
*) Hopefully fixed bug described in http://www.yacy-forum.de/viewtopic.php?t=2825 .
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2611 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
michitux
2d9496577f
Removed double labels for forms in Blacklist_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2610 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
michitux
aa46269eff
Less margin/padding for dls (e.g. in Messages)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2609 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
michitux
567c40f5f0
Bookmark/delete-links now visible when mouse is over the searchresult, in standard-compliant browsers with css, in Microsoft Internet Explorer via JavaScript
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2608 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
0e84a969d6
*) Bugfix for serverCharBuffer read from file operation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2607 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
90ef19d778
*) first version of a serverCharBuffer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2606 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d374ef2bbe
bugfix for tryRemoveURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2605 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f644a1c3a7
better evaluation of index abstracts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1b48473bc5
bugfix to utf8 recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2603 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
90f7241b59
serverByteBuffer.trim() can now recognize utf-8 characters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2602 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
2fd610b556
http://www.yacy-forum.de/viewtopic.php?p=25611#25611
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
rramthun
20e1754379
Various fixes for the languages
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2600 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
e34d9b3fec
*) charset aware headlines (after the serverByteBuffer.trim problem is solved)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2599 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
8115ac47b5
*) charset aware metadata parsing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2598 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
3ac30bdf22
*) some todo markers added for additional charset support
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2597 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d54144a4e3
fixed bad snippet behavior (hopefully)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2596 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
06fa891152
*) htmlFilterContentScraper.java: using proper charset for document title
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5015e780c2
- simplified watchCrawler code
...
- changed display of watchCrawler slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2594 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
74c3e7cf29
*) storing document charset into plasmaParserDocument object (is needed later by the condenser)
...
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
c5d3020941
*) better errorhandling for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
d0a5a53789
*) changes needed for multi-language support
...
- parsers may need to know the charset of the byte stream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
michitux
31d6cdea53
WatchCrawler.html now valid xhtml, added the class TableCellActive to default skin, please update your skins (sorry, I removed it before because I hadn't seen it in any html-file)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2590 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d82875c72b
removed removal of 'funny symbols' that may have caused utf-8 problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2589 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
26ab1fa885
fixed null pointer exception
...
See http://www.yacy-forum.de/viewtopic.php?p=25598#25598
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
9bed90f8dc
bugfix in js
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2587 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b0e8ff6eda
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
b5904705ab
*) Bugfix for "determineRevisionNr: build.xml:98: SVN entries file does not exist" bug
...
See: http://www.yacy-forum.de/viewtopic.php?t=2824
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2585 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c42b011648
added watch crawler to menu
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2584 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
41e27b85b7
fix for crawler condition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
michitux
92157febcd
Bugfix for Blacklist_p.html: Adding of new patterns possible again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2582 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0ee7e45413
bugfix for merge method (caused by bad refactoring)
...
see http://www.yacy-forum.de/viewtopic.php?p=25529#25529
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2581 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago